Video Analytics Components to Build an AI Mama for Your Kids

Pallawi
9 min readDec 15, 2023

--

This project is my journey of building myself strong and giving strength to families of the world. This blog is the stepping-stone to filling my knowledge cup as it is said that you can not pour from an empty cup, I am stepping up and filling my cup with the best, practical and scalable solutions to build AI mama.

In this blog, I will list down different components of AI which will eventually process the data at various steps in the project.

This blog is also a call for contribution. Let AI serve and uplift society. Please support this noble project by suggesting components you think I have missed and could be needed for a job to be done.

The result of the first brainstorming session

If you have read my previous blog you now know my real-life experience that motivated me to take up this project.

  1. We concluded that kids and their responses to a stimulus and responses without a stimulus are unpredictable and need strong parameters and matrices to support categorising the results of events driven by the child’s responses.
  2. Whereas, a babysitter is an adult. There are no responses without a stimulus. When there are responses it has to be reasonable and backed by maturity, responsibility, rational thinking, professional training, self-awareness, compassion, ethical and moral judgment and many more.

These are the two icebreaking thoughts to begin the project with. We encourage your participation in this initial brainstorming.

List of components for the project:

Building and curating our datasets

Below are the prompts which I have used as input to Generative AI to create initial test videos. In the next blog, you will see the outputs from multiple models using pre-trained weights.

Image dataset

1. Empty room space,
2. Room with objects,
3. Room with objects and adult humans,
4. Room with objects and a child of various ages,
5. Room with objects, children of various ages and an adult human,
6. Room with objects, children of various ages and an adult human interacting with physical touch,
7. Room with objects, children of various ages and an adult human interacting with physical touch
8. Room with objects, children of various ages and an adult human interacting with emotions visible on the face of the child and the adult human

Video dataset

1. An adult human in an empty room
2. An adult human in a room full of objects
3. An adult human in a room full of objects doing some activities- Walking, Jumping, running, dancing.
4. An adult human in a room full of objects and interacting with the objects — Reading a book, taking in the phone, cleaning an object.
5. An adult human in a room full of objects and verbally and physically interacting with the child — Feeding, reading, singing, watching TV
6. A child playing without an adult in the room.
7. A child crying in a room.
8. A child laughing in a room.

Either we create our benchmark dataset step-by-step or we utilize multiple open-source datasets. Creating our dataset needs contact with the play school or we can hire people to act and create such data.

If you are reading this, I would request you to help us by connecting to people who would be interested in making this project a success by trusting our journey and helping us create our dataset.

Opensource datasets

This is a growing blog, request the readers to kindly contribute here if you have come across a dataset similar to the above section.

Deep learning models

The deep learning models are not numbered according to the sequence in which they are present in the project, I will present the architecture in the upcoming blogs.

Now, let us list the state-of-the-art contenders for deep learning models.

All of the models that I have mentioned below are based on my practical experiences of working with them. Please feel welcome to suggest SOTA models for different work categories.

  1. Human pose estimation- , YOLO-NAS Pose Detection, Openpose, Pose Landmark Detection by Mediapipe, Alphapose.
  2. Large Language models- ChatGPT by OpenAI, OPT (Open Pretrained Transformer) by Meta BARD by Google. These models can be accessed through the API with minimal cost and are a few excellent options for research projects.
  3. Object detection models- Detectron2 by Meta (GitHub), YOLOv8 by Ultralytics (GitHub), TensorFlow 2 Detection Model Zoo.
  4. Classification models-CoAtNet: Marrying Convolution and Attention for All Data Sizes (GitHub), Progressive Neural Architecture Search (PNAS), ResNeXt-10.
  5. Image captioning models- CoCa: Contrastive Captioners are Image-Text Foundation Models (GitHub).
  6. Video Captioning models-mPLUG-2, Explore more.
  7. Scene segmentation models- Detectron2’s PointRend (Point-based Rendering), mmsegmentation, Detectron2 Mask R-CNN.
  8. Optical Flow Models- For capturing the motion of objects between frames.
  9. Recurrent Neural Networks (RNNs) and LSTMs: To analyze temporal dependencies in video frames. Action recognition, anomaly detection, or temporal event segmentation in video streams.
  10. 3D CNNs: Like C3D, for capturing spatiotemporal features in videos.
  11. Action Recognition Models: Models
  12. Depth estimation models
  13. Speech to text models

The above list talks about components that can be needed to build the video analytics project we are creating. At the same time, all of the modules can be reused for multiple purposes.

The above list can grow with time and as I read, research and experiment this list will become more

Post-processing module

A post-processing module enhances the output coming from multiple models. This can comprise traditional computer vision, GIS libraries, result Refinement modules, Event Detection and Alerts modules, tracking and consolidation, report Generation, and User interface modules.

This list will grow as we go ahead with our research and implementations.

License check of deep learning models for commercialization

Proprietary License, Open Source Licenses ( MIT License, GNU General Public License (GPL), Apache License 2.0, BSD License), Academic License, Commercial License, Dual Licensing, Creative Commons Licenses (for datasets or model outputs), Public Domain (No License), Custom License, Enterprise License, Developer License, Subscription License, Freemium License, End-User License Agreement (EULA), Royalty-Based License, Site License, Cloud Service License.

License to check for use of a deep learning model for commercialization

1. Apache License 2.0

The Apache License 2.0 is a popular open-source license used widely in the software industry, including for deep learning models. It’s known for being permissive, providing significant freedom to users while also offering important legal protections. Here are some key aspects of the Apache License 2.0:

1. Grant of Rights:
— The license grants users the right to use, reproduce, modify, distribute, display, and perform the work. This includes both the original and modified versions of the software.

2. Patent License:
— The Apache License 2.0 includes an express grant of patent rights from contributors to the users. If a contributor holds a patent in the software, they grant a royalty-free license to any patent claims they own or control that are infringed by the software.

3. Contributions and Copyright:
— Contributions made to the software are also licensed under the Apache License 2.0. Each contributor retains copyright to their contributions but grants the same broad rights to others.

4. Redistribution:
— The software can be redistributed in source or binary form. However, redistributions must include a copy of the license itself and a notice providing attribution to the original authors of the content.

5. Modification:
— Users are free to modify the software. Modified versions can be distributed under the same terms as the original software.

6. No Trademark License:
— The license does not grant permission to use the trade names, trademarks, service marks, or product names of the licensors, except as required for reasonable and customary use in describing the origin of the work.

7. No Warranty:
— The software is provided “as is”, without warranties or conditions of any kind, either express or implied, including any warranty of merchantability, fitness for a particular purpose, or non-infringement.

8. Protection Against Patent Litigation:
— The license includes a provision that terminates the rights of users under the license if they bring a patent claim against the project or its contributors related to the licensed software.

9. State Changes:
— If the user distributes modified versions of the software, they must include a file documenting all changes made to the original.

The Apache License 2.0 is favoured by many in the open-source community for its simplicity and flexibility. It encourages open development and sharing while protecting both contributors and users from patent litigation risks. This makes it a popular choice for open-source projects, including those in the field of deep learning and AI.

2. Commercial License

A Commercial License in the context of software and deep learning models refers to a legal agreement that allows the licensee (typically a business entity) to use the software or model in a commercial setting. This type of license is distinct from open-source licenses and is often tailored to suit the specific needs of a commercial operation. Here are some key characteristics and components of Commercial Licenses:

1. Usage Rights:
— The license grants the right to use the software or model in a business, for-profit setting. This can include integration into commercial products, offering services based on the software, or using it internally for business operations.

2. Fees and Payment Terms:
— Commercial licenses typically require payment. This could be a one-time fee, a recurring subscription, or a royalty-based model where payments are tied to usage levels or revenue generated from the software.

3. Restrictions:
— The license may impose certain restrictions on the use of the software. For example, it may limit the number of users, the number of installations, the type of usage (e.g., only within a specific industry), or geographical restrictions.

4. Support and Maintenance:
— Commercial licenses often include provisions for ongoing support and maintenance. This can include access to customer service, regular updates, patches, and sometimes, customization services.

5. Warranty and Liability:
— Unlike many open-source licenses, commercial licenses often come with warranties regarding the performance of the software. Some clauses limit the liability of the software provider in case of software failure or other issues.

6. Intellectual Property Rights:
— The license clarifies that the intellectual property rights of the software remain with the provider. The commercial use does not transfer ownership of the software to the licensee.

7. Confidentiality:
— Commercial licenses may include confidentiality clauses to protect proprietary information, trade secrets, and other sensitive data related to the software.

8. Termination Clauses:
— These clauses define the conditions under which the license can be terminated, such as breach of contract, non-payment, or other specified circumstances.

9. Audit Rights:
— The licensor may retain the right to audit the licensee’s use of the software to ensure compliance with the terms of the license.

10. Renewal and Upgrades:
— The license may outline terms for renewal and access to upgraded versions of the software.

Commercial licenses are crucial for businesses that rely on proprietary software as part of their product offerings or internal processes. These licenses provide legal clarity and a framework for the commercial exploitation of software while ensuring that the rights and interests of both the software provider and the user are protected.

3. MIT License

The MIT License is a permissive free software license originating at the Massachusetts Institute of Technology (MIT). Minimal Restrictions, License Terms. The MIT License allows for commercial use, modification, and redistribution of the software. The license does not grant any rights to the trademarks or patents of the original authors, focusing solely on the software itself.

The MIT License’s simplicity and permissiveness have made it a popular choice for a wide range of software projects, from small open-source libraries to large-scale software frameworks. Its main appeal lies in the freedom it grants to users and developers, encouraging open sharing and collaboration in the software development community.

Conclusion:

I am finishing this blog here. This blog was an exercise for my brain. I learnt many new things and realized that I can use all of my experiences with different types of deep learning models, and data and error analysis techniques to build something that I would be proud to deliver to my society.

You can join me in this journey by sharing your work experiences and what you think about the blog in total. Feel free to share your valuable thoughts in the comment section.

I am excited and you will see me writing and publishing many blogs in this series of building “The AI Mama” for your kids.

--

--

Pallawi
Pallawi

Written by Pallawi

Computer Vision contributor. Lead Data Scientist @https://www.here.com/ Love Data Science.

No responses yet