Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data
Yixuan Pan, Ruoyi Qiao, Li Chen, Kashyap Chitta, Liang Pan, Haoguang Mai, Qingwen Bu, Hao Zhao, Cunyuan Zheng, Ping Luo, Hongyang Li
TL;DR
AMS presents a unified humanoid control framework that merges agile dynamic motion tracking with extreme balance maintenance by leveraging heterogeneous data sources: human MoCap trajectories for agility and synthetic balance motions for stability. It employs a hybrid reward structure and adaptive learning (adaptive sampling and reward shaping) to reconcile conflicting objectives and boost training efficiency. In both simulation and real-world tests on a Unitree G1, a single policy achieves both expressive motions (dancing, running) and zero-shot balance tasks (Ip Man style Squat), outperforming baselines. The work advances versatile, real-world humanoid control and teleoperation, while noting limitations in end-effector manipulation and RGB-based pose estimation, suggesting future work on precise manipulation and online retargeting.
Abstract
Humanoid robots are envisioned to perform a wide range of tasks in human-centered environments, requiring controllers that combine agility with robust balance. Recent advances in locomotion and whole-body tracking have enabled impressive progress in either agile dynamic skills or stability-critical behaviors, but existing methods remain specialized, focusing on one capability while compromising the other. In this work, we introduce AMS (Agility Meets Stability), the first framework that unifies both dynamic motion tracking and extreme balance maintenance in a single policy. Our key insight is to leverage heterogeneous data sources: human motion capture datasets that provide rich, agile behaviors, and physically constrained synthetic balance motions that capture stability configurations. To reconcile the divergent optimization goals of agility and stability, we design a hybrid reward scheme that applies general tracking objectives across all data while injecting balance-specific priors only into synthetic motions. Further, an adaptive learning strategy with performance-driven sampling and motion-specific reward shaping enables efficient training across diverse motion distributions. We validate AMS extensively in simulation and on a real Unitree G1 humanoid. Experiments demonstrate that a single policy can execute agile skills such as dancing and running, while also performing zero-shot extreme balance motions like Ip Man's Squat, highlighting AMS as a versatile control paradigm for future humanoid applications.
