A Survey on 3D Egocentric Human Pose Estimation
Md Mushfiqur Azam, Kevin Desai
TL;DR
This survey addresses the problem of 3D egocentric human pose estimation from first-person vision by organizing the literature around datasets, method families, and evaluation protocols. It categorizes approaches into skeletal-based and body-shape-based models, and catalogs nine datasets that span real, synthetic, and multi-view settings, highlighting how data design shapes model robustness. The paper provides a detailed analysis of state-of-the-art methods, their performance on benchmarks like $MPJPE$-based metrics, and the trade-offs between accuracy, real-time capability, and generalization. It also discusses open challenges such as occlusions, limited field of view, and the need for temporal and multi-view cues, offering future directions to improve robustness and practicality in real-world scenarios.
Abstract
Egocentric human pose estimation aims to estimate human body poses and develop body representations from a first-person camera perspective. It has gained vast popularity in recent years because of its wide range of applications in sectors like XR-technologies, human-computer interaction, and fitness tracking. However, to the best of our knowledge, there is no systematic literature review based on the proposed solutions regarding egocentric 3D human pose estimation. To that end, the aim of this survey paper is to provide an extensive overview of the current state of egocentric pose estimation research. In this paper, we categorize and discuss the popular datasets and the different pose estimation models, highlighting the strengths and weaknesses of different methods by comparative analysis. This survey can be a valuable resource for both researchers and practitioners in the field, offering insights into key concepts and cutting-edge solutions in egocentric pose estimation, its wide-ranging applications, as well as the open problems with future scope.
