Benchmarking Humanoid Imitation Learning with Motion Difficulty

Zhaorui Meng; Lu Yin; Xinrui Chen; Anjun Chen; Shihui Guo; Yipeng Qin

Benchmarking Humanoid Imitation Learning with Motion Difficulty

Zhaorui Meng, Lu Yin, Xinrui Chen, Anjun Chen, Shihui Guo, Yipeng Qin

TL;DR

The paper tackles the problem that existing evaluation metrics in physics-based humanoid motion imitation conflate policy performance with the intrinsic difficulty of motions. It introduces Motion Difficulty Score (MDS), a physics-grounded metric defined as the torque variation induced by bounded pose perturbations, decomposed into Spectral Diversity, Variance Diversity, and Segment Diversity, and complemented by a difficulty-aware dataset MD-AMASS. The authors validate MDS as a strong predictor of imitation error, and they derive MID and DSJE to enable finer, difficulty-aware evaluation. The work also demonstrates curriculum-learning benefits and demonstrates broader applicability of MDS for motion-quality assessment, anomaly detection, and cross-robot generalization.

Abstract

Physics-based motion imitation is central to humanoid control, yet current evaluation metrics (e.g., joint position error) only measure how well a policy imitates but not how difficult the motion itself is. This conflates policy performance with motion difficulty, obscuring whether failures stem from poor learning or inherently challenging motions. In this work, we address this gap with Motion Difficulty Score (MDS), a novel metric that defines and quantifies imitation difficulty independent of policy performance. Grounded in rigid-body dynamics, MDS interprets difficulty as the torque variation induced by small pose perturbations: larger torque-to-pose variation yields flatter reward landscapes and thus higher learning difficulty. MDS captures this through three properties of the perturbation-induced torque space: volume, variance, and temporal variability. We also use it to construct MD-AMASS, a difficulty-aware repartitioning of the AMASS dataset. Empirically, we rigorously validate MDS by demonstrating its explanatory power on the performance of state-of-the-art motion imitation policies. We further demonstrate the utility of MDS through two new MDS-based metrics: Maximum Imitable Difficulty (MID) and Difficulty-Stratified Joint Error (DSJE), providing fresh insights into imitation learning.

Benchmarking Humanoid Imitation Learning with Motion Difficulty

TL;DR

Abstract

Benchmarking Humanoid Imitation Learning with Motion Difficulty

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)