PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation
Sihan Zhao, Zixuan Wang, Tianyu Luan, Jia Jia, Wentao Zhu, Jiebo Luo, Junsong Yuan, Nan Xi
TL;DR
This paper addresses the dual challenge of evaluating generated human motion for physical feasibility and perceptual realism. It introduces a physical labeling paradigm that computes minimal corrections to satisfy physics, producing continuous ground-truth alignments, and builds PP-Motion, a data-driven metric trained with both physical and perceptual supervision using a correlation-based loss. The approach leverages a DSTformer-based encoder and an ML P decoder, with physical supervision from a physics simulator and reinforcement-learning–driven corrections, achieving superior alignment with physical laws and competitive alignment with human judgments across benchmarks. The work also provides fine-grained physical annotations for MotionPercept and demonstrates that these annotations improve downstream motion evaluation and can guide motion generation.
Abstract
Human motion generation has found widespread applications in AR/VR, film, sports, and medical rehabilitation, offering a cost-effective alternative to traditional motion capture systems. However, evaluating the fidelity of such generated motions is a crucial, multifaceted task. Although previous approaches have attempted at motion fidelity evaluation using human perception or physical constraints, there remains an inherent gap between human-perceived fidelity and physical feasibility. Moreover, the subjective and coarse binary labeling of human perception further undermines the development of a robust data-driven metric. We address these issues by introducing a physical labeling method. This method evaluates motion fidelity by calculating the minimum modifications needed for a motion to align with physical laws. With this approach, we are able to produce fine-grained, continuous physical alignment annotations that serve as objective ground truth. With these annotations, we propose PP-Motion, a novel data-driven metric to evaluate both physical and perceptual fidelity of human motion. To effectively capture underlying physical priors, we employ Pearson's correlation loss for the training of our metric. Additionally, by incorporating a human-based perceptual fidelity loss, our metric can capture fidelity that simultaneously considers both human perception and physical alignment. Experimental results demonstrate that our metric, PP-Motion, not only aligns with physical laws but also aligns better with human perception of motion fidelity than previous work.
