Table of Contents
Fetching ...

Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation

Nathan Louis, Mahzad Khoshlessan, Jason J. Corso

TL;DR

This work addresses the gap in evaluating 3D human pose predictions by introducing physics-based, temporally aware metrics to measure plausibility. It embeds predicted poses into a 28-DOF humanoid simulator, optimizes kinematic targets with CMA-ES to follow the motion under gravity and contact, and quantifies plausibility via CoM distance $CD$ and Pose Stability Duration $PSD_T$. On Human3.6M, the approach reveals that spatially accurate poses can still yield unstable or implausible motion in physics, and that the proposed metrics correlate with motion stability more robustly than framewise errors alone. The method provides a practical framework for assessing temporal physical realism in 3D HPE, compatible with off-the-shelf predictions, and enables better grounding in physical environments for AR, VR, and action understanding applications.

Abstract

Modeling humans in physical scenes is vital for understanding human-environment interactions for applications involving augmented reality or assessment of human actions from video (e.g. sports or physical rehabilitation). State-of-the-art literature begins with a 3D human pose, from monocular or multiple views, and uses this representation to ground the person within a 3D world space. While standard metrics for accuracy capture joint position errors, they do not consider physical plausibility of the 3D pose. This limitation has motivated researchers to propose other metrics evaluating jitter, floor penetration, and unbalanced postures. Yet, these approaches measure independent instances of errors and are not representative of balance or stability during motion. In this work, we propose measuring physical plausibility from within physics simulation. We introduce two metrics to capture the physical plausibility and stability of predicted 3D poses from any 3D Human Pose Estimation model. Using physics simulation, we discover correlations with existing plausibility metrics and measuring stability during motion. We evaluate and compare the performances of two state-of-the-art methods, a multi-view triangulated baseline, and ground truth 3D markers from the Human3.6m dataset.

Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation

TL;DR

This work addresses the gap in evaluating 3D human pose predictions by introducing physics-based, temporally aware metrics to measure plausibility. It embeds predicted poses into a 28-DOF humanoid simulator, optimizes kinematic targets with CMA-ES to follow the motion under gravity and contact, and quantifies plausibility via CoM distance and Pose Stability Duration . On Human3.6M, the approach reveals that spatially accurate poses can still yield unstable or implausible motion in physics, and that the proposed metrics correlate with motion stability more robustly than framewise errors alone. The method provides a practical framework for assessing temporal physical realism in 3D HPE, compatible with off-the-shelf predictions, and enables better grounding in physical environments for AR, VR, and action understanding applications.

Abstract

Modeling humans in physical scenes is vital for understanding human-environment interactions for applications involving augmented reality or assessment of human actions from video (e.g. sports or physical rehabilitation). State-of-the-art literature begins with a 3D human pose, from monocular or multiple views, and uses this representation to ground the person within a 3D world space. While standard metrics for accuracy capture joint position errors, they do not consider physical plausibility of the 3D pose. This limitation has motivated researchers to propose other metrics evaluating jitter, floor penetration, and unbalanced postures. Yet, these approaches measure independent instances of errors and are not representative of balance or stability during motion. In this work, we propose measuring physical plausibility from within physics simulation. We introduce two metrics to capture the physical plausibility and stability of predicted 3D poses from any 3D Human Pose Estimation model. Using physics simulation, we discover correlations with existing plausibility metrics and measuring stability during motion. We evaluate and compare the performances of two state-of-the-art methods, a multi-view triangulated baseline, and ground truth 3D markers from the Human3.6m dataset.

Paper Structure

This paper contains 15 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: On the Human3.6M dataset (S9 - Directions 1), while the predicted pose (top row) appears plausible, in simulation we see the lean at the hip causes a loss of balance over time.
  • Figure 2: From a video $v$, we estimate 3D human poses $\mathbf{X}$ from an off-the-shelf 3D HPE model. Next, we initialize kinematic joint targets $\mathbf{q}^k_{1:T}$ on a simulated body and optimize it to mimic the reference motion under simulated environmental effects. We measure the plausibility of this optimized output using CoM distance and Pose Stability Duration.
  • Figure 3: For the S11 - WalkTogether example, we show the 3D pose, optimized simulation, and 2D reprojection. Inaccurate camera and ground plane assumptions in (c) causes the motion to fail early on as the simulated body tries to step through the ground plane (red arrows).
  • Figure 4: For S9 - Photo 1 example, we show a misaligned 2D re-projected (right column) can still produce physically plausible simulation (left column) from NeuralPhysCap shimada2021neural. While this example displays higher MPJPE-2D=$50.0$mm, we measure Pose Stability Duration to be on par with other methods, $\text{PSD}_{100}=71.9$.