Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
Nathan Louis, Mahzad Khoshlessan, Jason J. Corso
TL;DR
This work addresses the gap in evaluating 3D human pose predictions by introducing physics-based, temporally aware metrics to measure plausibility. It embeds predicted poses into a 28-DOF humanoid simulator, optimizes kinematic targets with CMA-ES to follow the motion under gravity and contact, and quantifies plausibility via CoM distance $CD$ and Pose Stability Duration $PSD_T$. On Human3.6M, the approach reveals that spatially accurate poses can still yield unstable or implausible motion in physics, and that the proposed metrics correlate with motion stability more robustly than framewise errors alone. The method provides a practical framework for assessing temporal physical realism in 3D HPE, compatible with off-the-shelf predictions, and enables better grounding in physical environments for AR, VR, and action understanding applications.
Abstract
Modeling humans in physical scenes is vital for understanding human-environment interactions for applications involving augmented reality or assessment of human actions from video (e.g. sports or physical rehabilitation). State-of-the-art literature begins with a 3D human pose, from monocular or multiple views, and uses this representation to ground the person within a 3D world space. While standard metrics for accuracy capture joint position errors, they do not consider physical plausibility of the 3D pose. This limitation has motivated researchers to propose other metrics evaluating jitter, floor penetration, and unbalanced postures. Yet, these approaches measure independent instances of errors and are not representative of balance or stability during motion. In this work, we propose measuring physical plausibility from within physics simulation. We introduce two metrics to capture the physical plausibility and stability of predicted 3D poses from any 3D Human Pose Estimation model. Using physics simulation, we discover correlations with existing plausibility metrics and measuring stability during motion. We evaluate and compare the performances of two state-of-the-art methods, a multi-view triangulated baseline, and ground truth 3D markers from the Human3.6m dataset.
