Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking
Yun Liu, Bowen Yang, Licheng Zhong, He Wang, Li Yi
TL;DR
Mimicking-Bench introduces a comprehensive benchmark for learning generalizable humanoid-scene interaction by mimicking large-scale human references. It pairs a six-task, geometry-rich environment with a modular three-stage skill-learning paradigm (retargeting, tracking, imitation) and a large-scale human reference dataset to enable pipeline-level and modular evaluations. The experiments demonstrate that human mimicking improves task success and generalization, while highlighting critical design choices across retargeting, tracking, and imitation components and identifying directions for future research, such as dexterous hand integration. The work provides a foundation for systematic, scalable exploration of humanoid–scene interaction learning in both simulated and real-world contexts.
Abstract
Learning generic skills for humanoid robots interacting with 3D scenes by mimicking human data is a key research challenge with significant implications for robotics and real-world applications. However, existing methodologies and benchmarks are constrained by the use of small-scale, manually collected demonstrations, lacking the general dataset and benchmark support necessary to explore scene geometry generalization effectively. To address this gap, we introduce Mimicking-Bench, the first comprehensive benchmark designed for generalizable humanoid-scene interaction learning through mimicking large-scale human animation references. Mimicking-Bench includes six household full-body humanoid-scene interaction tasks, covering 11K diverse object shapes, along with 20K synthetic and 3K real-world human interaction skill references. We construct a complete humanoid skill learning pipeline and benchmark approaches for motion retargeting, motion tracking, imitation learning, and their various combinations. Extensive experiments highlight the value of human mimicking for skill learning, revealing key challenges and research directions.
