Human Modelling and Pose Estimation Overview
Pawel Knap
TL;DR
The paper surveys human modelling and pose estimation across computer vision, computer graphics, and machine learning, focusing on camera-based HPE as the primary driver of current SOTA progress. It analyzes a spectrum of representations from 2D/3D keypoints and heatmaps to SMPL-family meshes, and surveys datasets, metrics, and sensor modalities that shape progress. It contrasts skeleton-based versus model-based approaches in both 2D and 3D, highlighting where each type excels and the practical tradeoffs for real-world deployment. The authors identify gaps in 3D mesh realism, hand/face detail, and robust performance in crowded or dynamic scenes, and call for richer datasets, unified benchmarks, and accessible toolchains to accelerate industry adoption.
Abstract
Human modelling and pose estimation stands at the crossroads of Computer Vision, Computer Graphics, and Machine Learning. This paper presents a thorough investigation of this interdisciplinary field, examining various algorithms, methodologies, and practical applications. It explores the diverse range of sensor technologies relevant to this domain and delves into a wide array of application areas. Additionally, we discuss the challenges and advancements in 2D and 3D human modelling methodologies, along with popular datasets, metrics, and future research directions. The main contribution of this paper lies in its up-to-date comparison of state-of-the-art (SOTA) human pose estimation algorithms in both 2D and 3D domains. By providing this comprehensive overview, the paper aims to enhance understanding of 3D human modelling and pose estimation, offering insights into current SOTA achievements, challenges, and future prospects within the field.
