Deep learning for 3D human pose estimation and mesh recovery: A survey
Yang Liu, Changzhen Qiu, Zhiyong Zhang
TL;DR
The paper addresses the problem of reconstructing accurate 3D human pose and full-body meshes from visual data using deep learning. It synthesizes advances across single- and multi-person HPE, and explicit (parametric) and implicit mesh-recovery methods, outlining a comprehensive taxonomy and benchmarking landscape. Key contributions include a unified survey of over 200 references, structured coverage of sensors, representations, datasets, metrics, and applications, plus forward-looking directions and an updated project page. The work is significant for guiding researchers and practitioners by clarifying method trade-offs, data requirements, and practical pathways toward real-time, detailed human models in real-world applications.
Abstract
3D human pose estimation and mesh recovery have attracted widespread research interest in many areas, such as computer vision, autonomous driving, and robotics. Deep learning on 3D human pose estimation and mesh recovery has recently thrived, with numerous methods proposed to address different problems in this area. In this paper, to stimulate future research, we present a comprehensive review of recent progress over the past five years in deep learning methods for this area by delving into over 200 references. To the best of our knowledge, this survey is arguably the first to comprehensively cover deep learning methods for 3D human pose estimation, including both single-person and multi-person approaches, as well as human mesh recovery, encompassing methods based on explicit models and implicit representations. We also present comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions. A regularly updated project page can be found at https://github.com/liuyangme/SOTA-3DHPE-HMR.
