Table of Contents
Fetching ...

Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations

Katja Ludwig, Yuliia Oksymets, Robin Schön, Daniel Kienzle, Rainer Lienhart

TL;DR

This work tackles the need for full 3D human pose estimation that includes both joint locations and rotations from monocular input. It extends a fast 2D-to-3D uplift framework (UU) to predict rotations directly, exploring rotation representations ($R \in SO(3)$, axis-angle, quaternions) and supervision strategies, including fully supervised, SMPL-X–layer–based, and VPoser-prior–assisted weakly supervised variants. The proposed methods achieve state-of-the-art rotation accuracy (lower MPJAE) while eliminating the costly inverse kinematics step, yielding speedups of over 150× compared to IK-based approaches, and maintain or improve joint localization (MPJPE) on fit3D. These rotation-aware, IK-free models hold significant practical value for sports analytics and biomechanics, enabling near real-time analysis of an athlete’s motion with detailed joint orientations. The combination of Transformer-based uplift, SMPL-X integration, and body priors provides a robust framework for efficient, rotation-aware 3D pose estimation from 2D cues.

Abstract

In sports analytics, accurately capturing both the 3D locations and rotations of body joints is essential for understanding an athlete's biomechanics. While Human Mesh Recovery (HMR) models can estimate joint rotations, they often exhibit lower accuracy in joint localization compared to 3D Human Pose Estimation (HPE) models. Recent work addressed this limitation by combining a 3D HPE model with inverse kinematics (IK) to estimate both joint locations and rotations. However, IK is computationally expensive. To overcome this, we propose a novel 2D-to-3D uplifting model that directly estimates 3D human poses, including joint rotations, in a single forward pass. We investigate multiple rotation representations, loss functions, and training strategies - both with and without access to ground truth rotations. Our models achieve state-of-the-art accuracy in rotation estimation, are 150 times faster than the IK-based approach, and surpass HMR models in joint localization precision.

Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations

TL;DR

This work tackles the need for full 3D human pose estimation that includes both joint locations and rotations from monocular input. It extends a fast 2D-to-3D uplift framework (UU) to predict rotations directly, exploring rotation representations (, axis-angle, quaternions) and supervision strategies, including fully supervised, SMPL-X–layer–based, and VPoser-prior–assisted weakly supervised variants. The proposed methods achieve state-of-the-art rotation accuracy (lower MPJAE) while eliminating the costly inverse kinematics step, yielding speedups of over 150× compared to IK-based approaches, and maintain or improve joint localization (MPJPE) on fit3D. These rotation-aware, IK-free models hold significant practical value for sports analytics and biomechanics, enabling near real-time analysis of an athlete’s motion with detailed joint orientations. The combination of Transformer-based uplift, SMPL-X integration, and body priors provides a robust framework for efficient, rotation-aware 3D pose estimation from 2D cues.

Abstract

In sports analytics, accurately capturing both the 3D locations and rotations of body joints is essential for understanding an athlete's biomechanics. While Human Mesh Recovery (HMR) models can estimate joint rotations, they often exhibit lower accuracy in joint localization compared to 3D Human Pose Estimation (HPE) models. Recent work addressed this limitation by combining a 3D HPE model with inverse kinematics (IK) to estimate both joint locations and rotations. However, IK is computationally expensive. To overcome this, we propose a novel 2D-to-3D uplifting model that directly estimates 3D human poses, including joint rotations, in a single forward pass. We investigate multiple rotation representations, loss functions, and training strategies - both with and without access to ground truth rotations. Our models achieve state-of-the-art accuracy in rotation estimation, are 150 times faster than the IK-based approach, and surpass HMR models in joint localization precision.

Paper Structure

This paper contains 20 sections, 3 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Two examples comparing the results of our model (blue, left column) compared to those of the SOTA HMR model Multi-HMR multihmr (yellow, right column). We display meshes created with the estimated rotations and the ground truth body shape.
  • Figure 2: General architecture of the UU model. A pose sequence is fed through an intra-pose operating spatial Transformer $T_{Sp}$ followed by an inter-pose operating temporal Transformer $T_{T}$. A linear layer outputs 3D pose estimates for each pose in its input sequence, while a strided Transformer $T_{St}$ with a final linear layer reduces the sequence length to output a single 3D pose estimate for the central frame at position $t$ in the input sequence.
  • Figure 3: Visualization of the Chordal and Geodesic distances on the $SO(3)$ sphere for two rotation matrices $R_A$ and $R_B$. losses
  • Figure 4: Naive approach to estimate rotations. A second head branch (green) is added to the UU model (blue) to estimate a rotation sequence $\mathcal{R}$ and a refined rotation estimate for the central frame $\tilde{\theta}_t$.
  • Figure 5: Unified central frame joint rotation and location estimation. The estimated central rotations $\tilde{\theta}_t$ are fed through a SMPL-X layer (orange) to estimate the joint locations. The body shape $\beta_t$ is either the ground truth or obtained with the methods from a2b.
  • ...and 2 more figures