Table of Contents
Fetching ...

Reconstructing Humans with a Biomechanically Accurate Skeleton

Yan Xia, Xiaowei Zhou, Etienne Vouga, Qixing Huang, Georgios Pavlakos

TL;DR

This work tackles the biomechanical plausibility gap in vision-based 3D human reconstruction by introducing HSMR, an end-to-end method that regresses SKEL model parameters from a single image using a transformer. It leverages SMPL-to-SKEL conversion to generate pseudo ground truth and employs an iterative SKELify refinement to progressively improve supervision, enabling training without SKEL-annotated datasets. HSMR achieves competitive results with SMPL-based methods on standard benchmarks while delivering substantial gains for extreme poses and viewpoints due to biomechanical regularization and reduced joint-angle violations. The approach promises biomechanically valid reconstructions suitable for simulations and biomechanics research, with code and data openly released for reproducibility.

Abstract

In this paper, we introduce a method for reconstructing 3D humans from a single image using a biomechanically accurate skeleton model. To achieve this, we train a transformer that takes an image as input and estimates the parameters of the model. Due to the lack of training data for this task, we build a pipeline to produce pseudo ground truth model parameters for single images and implement a training procedure that iteratively refines these pseudo labels. Compared to state-of-the-art methods for 3D human mesh recovery, our model achieves competitive performance on standard benchmarks, while it significantly outperforms them in settings with extreme 3D poses and viewpoints. Additionally, we show that previous reconstruction methods frequently violate joint angle limits, leading to unnatural rotations. In contrast, our approach leverages the biomechanically plausible degrees of freedom making more realistic joint rotation estimates. We validate our approach across multiple human pose estimation benchmarks. We make the code, models and data available at: https://isshikihugh.github.io/HSMR/

Reconstructing Humans with a Biomechanically Accurate Skeleton

TL;DR

This work tackles the biomechanical plausibility gap in vision-based 3D human reconstruction by introducing HSMR, an end-to-end method that regresses SKEL model parameters from a single image using a transformer. It leverages SMPL-to-SKEL conversion to generate pseudo ground truth and employs an iterative SKELify refinement to progressively improve supervision, enabling training without SKEL-annotated datasets. HSMR achieves competitive results with SMPL-based methods on standard benchmarks while delivering substantial gains for extreme poses and viewpoints due to biomechanical regularization and reduced joint-angle violations. The approach promises biomechanically valid reconstructions suitable for simulations and biomechanics research, with code and data openly released for reproducibility.

Abstract

In this paper, we introduce a method for reconstructing 3D humans from a single image using a biomechanically accurate skeleton model. To achieve this, we train a transformer that takes an image as input and estimates the parameters of the model. Due to the lack of training data for this task, we build a pipeline to produce pseudo ground truth model parameters for single images and implement a training procedure that iteratively refines these pseudo labels. Compared to state-of-the-art methods for 3D human mesh recovery, our model achieves competitive performance on standard benchmarks, while it significantly outperforms them in settings with extreme 3D poses and viewpoints. Additionally, we show that previous reconstruction methods frequently violate joint angle limits, leading to unnatural rotations. In contrast, our approach leverages the biomechanically plausible degrees of freedom making more realistic joint rotation estimates. We validate our approach across multiple human pose estimation benchmarks. We make the code, models and data available at: https://isshikihugh.github.io/HSMR/

Paper Structure

This paper contains 13 sections, 3 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Human Skeleton and Mesh Recovery (HSMR). We propose an approach that recovers the biomechanical skeleton and the surface mesh of a human from a single image. We adopt a recent biomechanical model, SKEL keller2023skin and train a transformer to estimate the parameters of the model. We encourage the reader to see the skeleton and surface reconstructions in our https://isshikihugh.github.io/HSMR/.
  • Figure 2: Overview of our HSMR approach. A key design choice of HSMR is the adoption of the SKEL parametric body model keller2023skin which uses a biomechanically accurate skeleton. We employ a transformer-based architecture that takes as input a single image of a person and estimates the pose $q$ and shape parameters $\beta$ of SKEL, as well as the camera $\pi$. During training, we iteratively update the pseudo ground truth we use to supervise our model, aiming to improve its quality. For this, we optimize the HSMR estimate to align with the ground-truth 2D keypoints (SKELify). The output parameters of the optimization are used in future training iterations as supervision target.
  • Figure 3: Failure cases of SMPL-to-SKEL conversion. While we can technically fit SKEL to an instance of the SMPL model, this conversion can often lead to problematic SKEL results. Here, we visualize SMPL meshes (light green), and the SKEL meshes we get when we try to fit the SKEL model to the SMPL mesh (light blue). For the fitting, we use the optimization code of keller2023skin.
  • Figure 4: Examples of unnatural joint rotation for SMPL. SMPL represents the knee with a ball (socket) joint. This allows mesh recovery methods like HMR2.0 goel2023humans to generate invalid rotations. We visualize examples from HMR2.0 (light green) where the knee is bend in unnatural ways. In comparison, the HSMR output (light blue) respects the biomechanical constraints.
  • Figure 5: Qualitative evaluation of HSMR. For each input example we show: a) the input image, b) the overlay of SKEL in the input view, c) a side view, d) the top view. We visualize both the skeleton and the transparent mesh of the estimated SKEL.
  • ...and 2 more figures