Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations
Katja Ludwig, Yuliia Oksymets, Robin Schön, Daniel Kienzle, Rainer Lienhart
TL;DR
This work tackles the need for full 3D human pose estimation that includes both joint locations and rotations from monocular input. It extends a fast 2D-to-3D uplift framework (UU) to predict rotations directly, exploring rotation representations ($R \in SO(3)$, axis-angle, quaternions) and supervision strategies, including fully supervised, SMPL-X–layer–based, and VPoser-prior–assisted weakly supervised variants. The proposed methods achieve state-of-the-art rotation accuracy (lower MPJAE) while eliminating the costly inverse kinematics step, yielding speedups of over 150× compared to IK-based approaches, and maintain or improve joint localization (MPJPE) on fit3D. These rotation-aware, IK-free models hold significant practical value for sports analytics and biomechanics, enabling near real-time analysis of an athlete’s motion with detailed joint orientations. The combination of Transformer-based uplift, SMPL-X integration, and body priors provides a robust framework for efficient, rotation-aware 3D pose estimation from 2D cues.
Abstract
In sports analytics, accurately capturing both the 3D locations and rotations of body joints is essential for understanding an athlete's biomechanics. While Human Mesh Recovery (HMR) models can estimate joint rotations, they often exhibit lower accuracy in joint localization compared to 3D Human Pose Estimation (HPE) models. Recent work addressed this limitation by combining a 3D HPE model with inverse kinematics (IK) to estimate both joint locations and rotations. However, IK is computationally expensive. To overcome this, we propose a novel 2D-to-3D uplifting model that directly estimates 3D human poses, including joint rotations, in a single forward pass. We investigate multiple rotation representations, loss functions, and training strategies - both with and without access to ground truth rotations. Our models achieve state-of-the-art accuracy in rotation estimation, are 150 times faster than the IK-based approach, and surpass HMR models in joint localization precision.
