Table of Contents
Fetching ...

Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data

Yixuan Pan, Ruoyi Qiao, Li Chen, Kashyap Chitta, Liang Pan, Haoguang Mai, Qingwen Bu, Hao Zhao, Cunyuan Zheng, Ping Luo, Hongyang Li

TL;DR

AMS presents a unified humanoid control framework that merges agile dynamic motion tracking with extreme balance maintenance by leveraging heterogeneous data sources: human MoCap trajectories for agility and synthetic balance motions for stability. It employs a hybrid reward structure and adaptive learning (adaptive sampling and reward shaping) to reconcile conflicting objectives and boost training efficiency. In both simulation and real-world tests on a Unitree G1, a single policy achieves both expressive motions (dancing, running) and zero-shot balance tasks (Ip Man style Squat), outperforming baselines. The work advances versatile, real-world humanoid control and teleoperation, while noting limitations in end-effector manipulation and RGB-based pose estimation, suggesting future work on precise manipulation and online retargeting.

Abstract

Humanoid robots are envisioned to perform a wide range of tasks in human-centered environments, requiring controllers that combine agility with robust balance. Recent advances in locomotion and whole-body tracking have enabled impressive progress in either agile dynamic skills or stability-critical behaviors, but existing methods remain specialized, focusing on one capability while compromising the other. In this work, we introduce AMS (Agility Meets Stability), the first framework that unifies both dynamic motion tracking and extreme balance maintenance in a single policy. Our key insight is to leverage heterogeneous data sources: human motion capture datasets that provide rich, agile behaviors, and physically constrained synthetic balance motions that capture stability configurations. To reconcile the divergent optimization goals of agility and stability, we design a hybrid reward scheme that applies general tracking objectives across all data while injecting balance-specific priors only into synthetic motions. Further, an adaptive learning strategy with performance-driven sampling and motion-specific reward shaping enables efficient training across diverse motion distributions. We validate AMS extensively in simulation and on a real Unitree G1 humanoid. Experiments demonstrate that a single policy can execute agile skills such as dancing and running, while also performing zero-shot extreme balance motions like Ip Man's Squat, highlighting AMS as a versatile control paradigm for future humanoid applications.

Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data

TL;DR

AMS presents a unified humanoid control framework that merges agile dynamic motion tracking with extreme balance maintenance by leveraging heterogeneous data sources: human MoCap trajectories for agility and synthetic balance motions for stability. It employs a hybrid reward structure and adaptive learning (adaptive sampling and reward shaping) to reconcile conflicting objectives and boost training efficiency. In both simulation and real-world tests on a Unitree G1, a single policy achieves both expressive motions (dancing, running) and zero-shot balance tasks (Ip Man style Squat), outperforming baselines. The work advances versatile, real-world humanoid control and teleoperation, while noting limitations in end-effector manipulation and RGB-based pose estimation, suggesting future work on precise manipulation and online retargeting.

Abstract

Humanoid robots are envisioned to perform a wide range of tasks in human-centered environments, requiring controllers that combine agility with robust balance. Recent advances in locomotion and whole-body tracking have enabled impressive progress in either agile dynamic skills or stability-critical behaviors, but existing methods remain specialized, focusing on one capability while compromising the other. In this work, we introduce AMS (Agility Meets Stability), the first framework that unifies both dynamic motion tracking and extreme balance maintenance in a single policy. Our key insight is to leverage heterogeneous data sources: human motion capture datasets that provide rich, agile behaviors, and physically constrained synthetic balance motions that capture stability configurations. To reconcile the divergent optimization goals of agility and stability, we design a hybrid reward scheme that applies general tracking objectives across all data while injecting balance-specific priors only into synthetic motions. Further, an adaptive learning strategy with performance-driven sampling and motion-specific reward shaping enables efficient training across diverse motion distributions. We validate AMS extensively in simulation and on a real Unitree G1 humanoid. Experiments demonstrate that a single policy can execute agile skills such as dancing and running, while also performing zero-shot extreme balance motions like Ip Man's Squat, highlighting AMS as a versatile control paradigm for future humanoid applications.

Paper Structure

This paper contains 15 sections, 8 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Introducing AMS (Agility Meets Stability), one single policy that performs diverse motions with stability and agility simultaneously on a humanoid robot. The robot can execute challenging balance motions such as (a) Ip Man's Squat, a Kung Fu-style single-leg squat, unseen during training (zero-shot); (b) single-leg balance stances which humans find hard to perform; (c) balanced stretching; as well as expressive motions and high-mobility movements with precise control, such as (d) dancing and (e) running. More examples are provided in the appended video.
  • Figure 2: Overview of AMS.(a) The general whole-body tracking pipeline retargets human MoCap data to reference motions and adopts a teacher-student-based strategy for reinforcement learning (\ref{['sec:method-wbt']}). To address data limitations and conflicting optimization objectives, AMS introduces three key components as follows. (b) Synthetic balance data is generated to complement human MoCap data and address data limitations (\ref{['sec:method-synthetic']}). (c) Adaptive learning is employed with adaptive sampling and reward shaping based on individual motion performance (\ref{['sec:method-adaptive']}). (d) Hybrid rewards are designed with general rewards for all motions and balance prior rewards exclusively for synthetic motions (\ref{['sec:method-hybrid']}).
  • Figure 3: Motion space analysis of human data and generated balance data.(a) Humans and humanoids feature distinctive balance motion spaces, leading to limited reference motions for training whole-body balancing skills. (b) Sensor noise and kinematic retargeting errors greatly affect the reference motion quality from human MoCap data. (c) Constrained synthetic balance data guarantees physical realism, such as the foot contact state and center of mass. (d) Example of a generated synthetic balance motion, with the swinging foot trajectory shown in green.
  • Figure 4: RGB camera-based real-time teleoperation.