Table of Contents
Fetching ...

Human Modelling and Pose Estimation Overview

Pawel Knap

TL;DR

The paper surveys human modelling and pose estimation across computer vision, computer graphics, and machine learning, focusing on camera-based HPE as the primary driver of current SOTA progress. It analyzes a spectrum of representations from 2D/3D keypoints and heatmaps to SMPL-family meshes, and surveys datasets, metrics, and sensor modalities that shape progress. It contrasts skeleton-based versus model-based approaches in both 2D and 3D, highlighting where each type excels and the practical tradeoffs for real-world deployment. The authors identify gaps in 3D mesh realism, hand/face detail, and robust performance in crowded or dynamic scenes, and call for richer datasets, unified benchmarks, and accessible toolchains to accelerate industry adoption.

Abstract

Human modelling and pose estimation stands at the crossroads of Computer Vision, Computer Graphics, and Machine Learning. This paper presents a thorough investigation of this interdisciplinary field, examining various algorithms, methodologies, and practical applications. It explores the diverse range of sensor technologies relevant to this domain and delves into a wide array of application areas. Additionally, we discuss the challenges and advancements in 2D and 3D human modelling methodologies, along with popular datasets, metrics, and future research directions. The main contribution of this paper lies in its up-to-date comparison of state-of-the-art (SOTA) human pose estimation algorithms in both 2D and 3D domains. By providing this comprehensive overview, the paper aims to enhance understanding of 3D human modelling and pose estimation, offering insights into current SOTA achievements, challenges, and future prospects within the field.

Human Modelling and Pose Estimation Overview

TL;DR

The paper surveys human modelling and pose estimation across computer vision, computer graphics, and machine learning, focusing on camera-based HPE as the primary driver of current SOTA progress. It analyzes a spectrum of representations from 2D/3D keypoints and heatmaps to SMPL-family meshes, and surveys datasets, metrics, and sensor modalities that shape progress. It contrasts skeleton-based versus model-based approaches in both 2D and 3D, highlighting where each type excels and the practical tradeoffs for real-world deployment. The authors identify gaps in 3D mesh realism, hand/face detail, and robust performance in crowded or dynamic scenes, and call for richer datasets, unified benchmarks, and accessible toolchains to accelerate industry adoption.

Abstract

Human modelling and pose estimation stands at the crossroads of Computer Vision, Computer Graphics, and Machine Learning. This paper presents a thorough investigation of this interdisciplinary field, examining various algorithms, methodologies, and practical applications. It explores the diverse range of sensor technologies relevant to this domain and delves into a wide array of application areas. Additionally, we discuss the challenges and advancements in 2D and 3D human modelling methodologies, along with popular datasets, metrics, and future research directions. The main contribution of this paper lies in its up-to-date comparison of state-of-the-art (SOTA) human pose estimation algorithms in both 2D and 3D domains. By providing this comprehensive overview, the paper aims to enhance understanding of 3D human modelling and pose estimation, offering insights into current SOTA achievements, challenges, and future prospects within the field.
Paper Structure (21 sections, 7 figures, 2 tables)

This paper contains 21 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Different representations of humans used in HPE system's visualisations. Source zheng2023deep
  • Figure 2: Dichotomy of Human Pose Estimation algorithms.
  • Figure 3: Standard framework for estimating the pose of a single individual. Image taken from survey_HPE.
  • Figure 4: Standard frameworks for estimating poses of multiple people. Both top-down and bottom-up methods use encoder-decoder architecture. Image taken from survey_HPE.
  • Figure 5: The workflow of the bottom-up approach utilized in OpenPose openpose. The (b) Part Confidence Maps represent the heatmaps of body parts. Following the prediction of (c) Part Affinity Fields, (d) Bipartite Matching is executed to correlate body part candidates, culminating in the derivation of the (e) Parsing Results. Image source: openpose.
  • ...and 2 more figures