One-shot Humanoid Whole-body Motion Learning
Hao Huang, Geeta Chandra Raju Bethala, Shuaihang Yuan, Congcong Wen, Anthony Tzes, Yi Fang
TL;DR
This work tackles the data bottleneck in learning expressive humanoid whole-body motion by showing that a policy can be learned with only a single non-walking target sample supplemented by numerous walking motions. It introduces a one-shot pipeline that uses order-preserving optimal transport to align walking sequences with the target, interpolates along geodesics to generate intermediate poses, and enforces collision-free configurations before retargeting to a humanoid for RL training in simulation. Key contributions include the OPOT-based sequence alignment, geodesic pose sampling, and differentiable collision-aware optimization on the pose-skeleton manifold, enabling effective policy learning without training neural networks for motion generation. The approach demonstrates superior performance on CMU MoCap benchmarks in sim-to-sim transfer and reduces data collection burden, offering a practical path to data-efficient, expressive humanoid control.
Abstract
Whole-body humanoid motion represents a cornerstone challenge in robotics, integrating balance, coordination, and adaptability to enable human-like behaviors. However, existing methods typically require multiple training samples per motion category, rendering the collection of high-quality human motion datasets both labor-intensive and costly. To address this, we propose a novel approach that trains effective humanoid motion policies using only a single non-walking target motion sample alongside readily available walking motions. The core idea lies in leveraging order-preserving optimal transport to compute distances between walking and non-walking sequences, followed by interpolation along geodesics to generate new intermediate pose skeletons, which are then optimized for collision-free configurations and retargeted to the humanoid before integration into a simulated environment for policy training via reinforcement learning. Experimental evaluations on the CMU MoCap dataset demonstrate that our method consistently outperforms baselines, achieving superior performance across metrics. Code will be released upon acceptance.
