Which Human pose estimation model should you pick to realise your ideas for a video analytics product in the year 2024

8 min readDec 19, 2023

In this image, I have summarized the SOTA human pose estimation model usability on a few factors that can give you a kick start to decide which one you want to begin your POC with.

The SOTA models may look lucrative and exciting to a business but I do not want your business to miss out on the Licence information before beginning a journey that builds a successful and scalable product.

This blog will give you a clear understanding of which SOTA deep learning human pose estimation model you can choose to go ahead and build a quick POC (proof of concept) for your business in the year 2024.

I have focused on providing you with information about the Licence information of these SOTA models.

I also recommend reading the research paper - DeepPose by Google published in CVPR 2014, if you are a beginner in the field of video analytics.

Appreciation and my thoughts on the commercial use of deep learning models

I Appreciate a license that restricts the commercial use of deep learning models involves recognizing the balance it strikes between innovation and control. Such licenses foster a unique environment where academic and research-oriented work flourishes, allowing for open experimentation, learning, and collaboration without the pressure of commercial interests. This approach ensures that the core focus remains on advancing the field of artificial intelligence and deep learning through pure research and academic inquiry.

By limiting commercial exploitation, these licenses help maintain the integrity of the research, encouraging a deeper understanding and more thoughtful application of technology. This can lead to more groundbreaking discoveries and developments, as the knowledge and tools are shared openly in the community, building a strong foundation for future innovations rooted in a comprehensive and generous understanding of the technology.

Openpose:

If you wish to use these models for non-commercial activity, kindly look at this page. Carnegie Mellon University holds the licence. Openpose is available for academic and non-profit organizations and noncommercial research use only.
The non-exclusive commercial license requires a non-refundable USD 25,000 annual royalty.
The non-exclusive commercial license cannot be used in the field of Sports. (“Sports” shall mean any athletic competition between individuals, groups of individuals or teams.)
BODY_25, based on the COCO dataset, based on the MPII dataset, the Hand model detects key points of the hand and face: For facial landmarks detection, and the Foot: An extension to the BODY_25 model, which includes additional key points for the feet.
You may want to see the engagements stats here The GitHub Link- https://github.com/CMU-Perceptual-Computing-Lab/openpose, GitHub Stars- 28.9k, GitHub Fork-7.7k, Issues- 274

I can not believe this! Tough to digest!

YOLO-NAS :

Commercial use, especially in production environments, is restricted unless otherwise agreed upon with Deci. Deci.AI holds the licence for YOLO-NAS.
YOLO-NAS is developed by Deci’s Neural Architecture Search Technology. The models demonstrate an excellent balance between latency and accuracy.
You may want to see the engagements stats here The GitHub Link- https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS-POSE.md, GitHub Stars- 3.9k, GitHub Fork-430, Issues-84

Alphapose:

Shanghai Jiao Tong University holds the licence for Alphapose.
One must explore the licence types as models in the Alphapose have different licence types. The different licence types are Copyright by the University of Michigan, Apache License, and MIT License.
AlphaPose is freely available for free non-commercial use. To attain a commercial license, one must have an agreement with the Licensor Shanghai Jiao Tong University.
The types of models that are part of the Alphapose framework, AccuratePose, SinglePose, and MultiPose.
You may want to see the engagements stats here — The GitHub Link- https://github.com/MVIG-SJTU/AlphaPose, GitHub Stars-7.5k, GitHub- Fork-1.9k, Issues- 245.
Edge Devices- Devices like the NVIDIA Jetson series, Google Coral, or Raspberry Pi can run Alphapose for edge computing applications.

Tensorflow Pose-detection:

This repository is to your rescue if you are looking forward to building a Human pose estimation model for a commercial purpose.
This is how the licence looks like for the stable Tensorflow Pose-detection model
It is a pose estimator architecture built on tensorflow.js.
You can also run the models supported by tensorflow.js in Python language. Google Colab.
There are 3 architectures in the Tensorflow Pose-detection. MoveNet 2021 (It can detect 17 key points of a body), BlazePose (MediaPipe BlazePose can detect 33 key points, in addition to the 17 COCO key points) and PoseNet 2018 (PoseNet can detect multiple poses, each pose contains 17 key points).
MoveNet.Lightning and MoveNet.Thunder. MoveNet outperforms PoseNet on various datasets, especially in images with fitness action images. Therefore, we recommend using MoveNet over PoseNet.
You may want to see the engagements stats here — The GitHub Link- https://github.com/tensorflow/tfjs-models/tree/master/pose-detection, tfjs GitHub Stars-13.4k, GitHub- Fork-4.2k, Issues- 245.

You can see a green tick below the Permissions column which allows Commercial use.

DensePose Detectron2 by Meta

DensePose aims at learning and establishing dense correspondences between image pixels and 3D object geometry for deformable objects, such as humans or animals. In simple words, DensePose estimation can map all human pixels of 2D RGB images to a 3D surface-based model of the body in real time.
In this repository, authors have provided the code to train and evaluate DensePose R-CNN and various tools to visualize DensePose annotations and results.
Chart-based Dense Pose Estimation for Humans and Animals Charts: In the context of dense pose estimation, a ‘chart’ refers to a mapping that relates each pixel of the 2D image to a point on a 3D surface model. This model is often divided into several ‘charts’ for more accurate mapping. Application: Each chart corresponds to a specific part of the body. For instance, one chart might map the human torso, while another maps the limbs.
Dense pose estimation represents a significant leap from traditional pose estimation by providing a more detailed and comprehensive understanding of bodily movements and positions. The chart-based approach makes it possible to accurately map human and animal bodies' complex contours and movements, paving the way for innovative applications across various fields.
Continuous Surface Embeddings for Dense Pose Estimation for Humans and Animals. Concept: Continuous surface embeddings involve representing every point on the body’s surface as a continuous, high-dimensional space. This allows for a more nuanced mapping compared to discrete key points. Representation: The surface of the body (human or animal) is typically represented as a 3D model, and the embeddings map each pixel of the 2D image onto this model.
You may want to see the engagement stats here — The GitHub Link- https://github.com/facebookresearch/detectron2/tree/main/projects/DensePose, https://github.com/facebookresearch/Densepose

Deep High-Resolution Representation Learning for Human Pose Estimation (2019)

HRNets are designed to extract features at multiple scales simultaneously. This allows the network to learn high-level semantic information (like a person's overall posture) and detailed spatial information (like the precise position of joints).
You may want to see the engagement stats here — The GitHub Link- https://github.com/leoxiaobin/deep-high-resolution-net.pytorch, GitHub Stars4.2k, GitHub- Fork-907, Issues- 199.

MMPose

MMPose supports the use of High-Resolution Networks (HRNet) for human pose estimation.
MMPose is a Pytorch-based pose estimation open-source toolkit, a member of the OpenMMLab Project. It contains a rich set of algorithms for 2D multi-person human pose estimation, 2D hand pose estimation, 2D face landmark detection, 133 keypoint whole-body human pose estimation, fashion landmark detection and animal pose estimation as well as related components and modules.
You may want to see the engagement stats here — The GitHub Link- https://github.com/open-mmlab/mmpose
(ICCV 2023) MotionBERT for 3D pose estimation, (ICCVW 2023) DWPose two-stage distillation method DWPose, which achieves the new SOTA performance on COCO-WholeBody, (ICLR 2023) EDPose Explicit box Detection for multi-person Pose estimation, (ICLR 2022) Uniformer is a top-down heatmap based human pose estimator. MMPose SupportS both top-down and bottom-up pose estimation approaches.
Achieve higher training efficiency and higher accuracy than other popular codebases (e.g. AlphaPose, HRNet). Support various backbone models: ResNet, HRNet, SCNet, Houglass and HigherHRNet.
Datasets:
300WLP (IEEE’2017), CrowdPose (CVPR’2019), AI Challenger (ArXiv’2017), InterHand2.6M (ECCV’2020), Human-Art (CVPR’2023), COFW (ICCV’2013), MPII (CVPR’2014), Halpe (CVPR’2020), COCO-WholeBody-Face (ECCV’2020), DeepFashion (CVPR’2016), PoseTrack18 (CVPR’2018), JHMDB (ICCV’2013), WFLW (CVPR’2018), Animal-Pose (ICCV’2019), FreiHand (ICCV’2019), OneHand10K (TCSVT’2019), UBody (CVPR’2023), COCO-WholeBody-Hand (ECCV’2020), COCO (ECCV’2014), RHD (ICCV’2017), AP-10K (NeurIPS’2021), 300W (IMAVIS’2016), AFLW (ICCVW’2011), Grévy’s Zebra (Elife’2019), LaPa (AAAI’2020), Desert Locust (Elife’2019), Human3.6M (TPAMI’2014), COCO-WholeBody (ECCV’2020).
Read more about MMPose here.
Release note of MMPose.

But “IDEA License 1.0” is a custom or organization-specific license, it would be important to read through its terms to understand the permissions, conditions, and limitations it sets forth. Custom licenses can vary significantly in their terms and may have specific stipulations that are not commonly found in more standard licenses.

DeepPose: Human Pose Estimation via Deep Neural Networks by Google (2014)

DeepPose was one of the first major efforts to apply deep learning techniques to the problem of pose estimation. Before DeepPose, most pose estimation models relied on traditional computer vision techniques. DeepPose marked a paradigm shift by using deep neural networks for the task.
Must-read paper- Official paper

Ultralytics YOLOv8 Pose estimation

The YOLOv8 Pose model, part of the Ultralytics YOLOv8 series, is used for pose estimation tasks.
The Ultralytics YOLOv8 Pose estimation models are licensed under the AGPL-3.0 License. This open-source license is designed to encourage open collaboration and the free exchange of knowledge. It requires that all software and AI models under its jurisdiction be open-sourced.
Ultralytics also offers an alternative licensing option for commercial use.

The application that the lead AI companies are building around Human pose estimation

Products that Microsoft is building for the Human pose estimation market
Intel and Human pose estimation
3DAT (3D Athlete Tracking) deployment by Amazon
Meta — Introducing Ego-Exo4D: A foundational dataset for research on video learning and multimodal perception

License options available with GitHub:

Conclusion:

After my research, I am sharing the table below that motivates me to begin my POC with MMPose followed by DensePose and then the sequence mentioned in the table below.

You can get a fair idea to get started with your POC as well. Almost all of these models have pre-trained weights, architecture and backbones that support either 2D, 3D or both types of human pose estimation.

This blog is also a call for contribution. Feel free to use the comment section to discuss more about the topic of human pose estimation.

If this blog helped you learn something new today, feel free to clap and follow my page for the upcoming progress of this project of AI mama.