Table of Contents
Fetching ...

A Survey on 3D Egocentric Human Pose Estimation

Md Mushfiqur Azam, Kevin Desai

TL;DR

This survey addresses the problem of 3D egocentric human pose estimation from first-person vision by organizing the literature around datasets, method families, and evaluation protocols. It categorizes approaches into skeletal-based and body-shape-based models, and catalogs nine datasets that span real, synthetic, and multi-view settings, highlighting how data design shapes model robustness. The paper provides a detailed analysis of state-of-the-art methods, their performance on benchmarks like $MPJPE$-based metrics, and the trade-offs between accuracy, real-time capability, and generalization. It also discusses open challenges such as occlusions, limited field of view, and the need for temporal and multi-view cues, offering future directions to improve robustness and practicality in real-world scenarios.

Abstract

Egocentric human pose estimation aims to estimate human body poses and develop body representations from a first-person camera perspective. It has gained vast popularity in recent years because of its wide range of applications in sectors like XR-technologies, human-computer interaction, and fitness tracking. However, to the best of our knowledge, there is no systematic literature review based on the proposed solutions regarding egocentric 3D human pose estimation. To that end, the aim of this survey paper is to provide an extensive overview of the current state of egocentric pose estimation research. In this paper, we categorize and discuss the popular datasets and the different pose estimation models, highlighting the strengths and weaknesses of different methods by comparative analysis. This survey can be a valuable resource for both researchers and practitioners in the field, offering insights into key concepts and cutting-edge solutions in egocentric pose estimation, its wide-ranging applications, as well as the open problems with future scope.

A Survey on 3D Egocentric Human Pose Estimation

TL;DR

This survey addresses the problem of 3D egocentric human pose estimation from first-person vision by organizing the literature around datasets, method families, and evaluation protocols. It categorizes approaches into skeletal-based and body-shape-based models, and catalogs nine datasets that span real, synthetic, and multi-view settings, highlighting how data design shapes model robustness. The paper provides a detailed analysis of state-of-the-art methods, their performance on benchmarks like -based metrics, and the trade-offs between accuracy, real-time capability, and generalization. It also discusses open challenges such as occlusions, limited field of view, and the need for temporal and multi-view cues, offering future directions to improve robustness and practicality in real-world scenarios.

Abstract

Egocentric human pose estimation aims to estimate human body poses and develop body representations from a first-person camera perspective. It has gained vast popularity in recent years because of its wide range of applications in sectors like XR-technologies, human-computer interaction, and fitness tracking. However, to the best of our knowledge, there is no systematic literature review based on the proposed solutions regarding egocentric 3D human pose estimation. To that end, the aim of this survey paper is to provide an extensive overview of the current state of egocentric pose estimation research. In this paper, we categorize and discuss the popular datasets and the different pose estimation models, highlighting the strengths and weaknesses of different methods by comparative analysis. This survey can be a valuable resource for both researchers and practitioners in the field, offering insights into key concepts and cutting-edge solutions in egocentric pose estimation, its wide-ranging applications, as well as the open problems with future scope.
Paper Structure (9 sections, 7 figures, 6 tables)

This paper contains 9 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Difference between (a) traditional human pose estimation ref05 and (b) egocentric human pose estimation selfpose
  • Figure 2: Dataset setup for UnrealEgo unrealego: Left image shows a glass equipped with two fisheye cameras. The middle image provides a third-person perspective of the person, offering context to the scene. The right image depicts the egocentric view of the person.
  • Figure 3: Sample image from EgoPW egopw dataset visualizing egocentric view on the left image and exocentric view on the right image.
  • Figure 4: Sample image from EgoGTA sceneaware dataset.
  • Figure 5: Sample image from Wang et al.'s egoglobal dataset.
  • ...and 2 more figures