Table of Contents
Fetching ...

EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control

Chao Yang, Yingkai Sun, Peng Ye, Xin Chen, Chong Yu, Tao Chen

TL;DR

EGM tackles the challenge of learning a universal humanoid motion-tracking policy with limited high-quality data. It introduces four core designs: Bin-based Cross-motion Curriculum Adaptive Sampling, CDMoE, data-quality–driven data curation, and a three-stage curriculum training flow. The method yields a data-efficient policy trained on 4.08 hours that generalizes to 49.25 hours of test motions and outperforms baselines on both routine and highly dynamic tasks. This work offers a practical pathway toward robust, generalizable humanoid control in real-world variability.

Abstract

Learning a general motion tracking policy from human motions shows great potential for versatile humanoid whole-body control. Conventional approaches are not only inefficient in data utilization and training processes but also exhibit limited performance when tracking highly dynamic motions. To address these challenges, we propose EGM, a framework that enables efficient learning of a general motion tracking policy. EGM integrates four core designs. Firstly, we introduce a Bin-based Cross-motion Curriculum Adaptive Sampling strategy to dynamically orchestrate the sampling probabilities based on tracking error of each motion bin, eficiently balancing the training process across motions with varying dificulty and durations. The sampled data is then processed by our proposed Composite Decoupled Mixture-of-Experts (CDMoE) architecture, which efficiently enhances the ability to track motions from different distributions by grouping experts separately for upper and lower body and decoupling orthogonal experts from shared experts to separately handle dedicated features and general features. Central to our approach is a key insight we identified: for training a general motion tracking policy, data quality and diversity are paramount. Building on these designs, we develop a three-stage curriculum training flow to progressively enhance the policy's robustness against disturbances. Despite training on only 4.08 hours of data, EGM generalized robustly across 49.25 hours of test motions, outperforming baselines on both routine and highly dynamic tasks.

EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control

TL;DR

EGM tackles the challenge of learning a universal humanoid motion-tracking policy with limited high-quality data. It introduces four core designs: Bin-based Cross-motion Curriculum Adaptive Sampling, CDMoE, data-quality–driven data curation, and a three-stage curriculum training flow. The method yields a data-efficient policy trained on 4.08 hours that generalizes to 49.25 hours of test motions and outperforms baselines on both routine and highly dynamic tasks. This work offers a practical pathway toward robust, generalizable humanoid control in real-world variability.

Abstract

Learning a general motion tracking policy from human motions shows great potential for versatile humanoid whole-body control. Conventional approaches are not only inefficient in data utilization and training processes but also exhibit limited performance when tracking highly dynamic motions. To address these challenges, we propose EGM, a framework that enables efficient learning of a general motion tracking policy. EGM integrates four core designs. Firstly, we introduce a Bin-based Cross-motion Curriculum Adaptive Sampling strategy to dynamically orchestrate the sampling probabilities based on tracking error of each motion bin, eficiently balancing the training process across motions with varying dificulty and durations. The sampled data is then processed by our proposed Composite Decoupled Mixture-of-Experts (CDMoE) architecture, which efficiently enhances the ability to track motions from different distributions by grouping experts separately for upper and lower body and decoupling orthogonal experts from shared experts to separately handle dedicated features and general features. Central to our approach is a key insight we identified: for training a general motion tracking policy, data quality and diversity are paramount. Building on these designs, we develop a three-stage curriculum training flow to progressively enhance the policy's robustness against disturbances. Despite training on only 4.08 hours of data, EGM generalized robustly across 49.25 hours of test motions, outperforming baselines on both routine and highly dynamic tasks.

Paper Structure

This paper contains 19 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: We deploy a unified student policy trained with EGM in the simulation environment, achieving high robust stability in tracking highly dynamic motions, including (a) running and sudden stop, (b) hurdling, (c) throwing, (d) quick spinning, and (e) spinning jumps.
  • Figure 2: Overview of the EGM framework. First, large-scale Mocap datasets are retargeted to Humanoid, then a small dataset containing diverse and high-quality data is screened out. Next, training is conducted in the simulation environment, where a teacher policy based on the CDMoE architecture is trained using two-stage curriculum reinforcement learning, and then in Stage 3, the teacher policy is distilled into a deployable student policy. Additionally, the Bin-based Cross-motion Curriculum Adaptive Sampling Strategy is used during the training process to achieve dynamic sampling, improving training efficiency.
  • Figure 3: Expert weights distribution across different motion types. Slash represents upperbody experts. Showing how our CDMoE architecture assigns different weights to experts for various motion categories.
  • Figure 4: $E_{\text{mpkpe}}$ across motion durations. After using BCCAS, the policy has obvious advantages in tracking long-sequence motions.
  • Figure 5: The distribution of Sample ratio in the training data, as well as the typical motions with different Sample ratios. Motions with a larger Sample ratio tend to be more difficult, indicating that BCCAS has successfully achieved a higher concentration of attention on more difficult movements.
  • ...and 1 more figures