Table of Contents
Fetching ...

Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning

Jiyuan Shi, Xinzhe Liu, Dewei Wang, Ouyang Lu, Sören Schwertfeger, Chi Zhang, Fuchun Sun, Chenjia Bai, Xuelong Li

TL;DR

ALMI addresses the challenge of human-like whole-body coordination by decoupling learning for locomotion and motion imitation through an adversarial training framework. It introduces a dual-curriculum, two-player Markov-game approach and the ALMI-X dataset to support language-guided, end-to-end humanoid control and foundation-model research. Empirical results in simulation and on the Unitree H1-2 demonstrate improved robustness and tracking accuracy over baselines, and the ALMI-X dataset enables preliminary foundation-model exploration. The work also highlights promising avenues for end-to-end humanoid control while acknowledging limitations in highly dynamic tasks and sim-to-real data efficiency, suggesting future improvements in unified rewards and data-efficient foundation models.

Abstract

Humans exhibit diverse and expressive whole-body movements. However, attaining human-like whole-body coordination in humanoid robots remains challenging, as conventional approaches that mimic whole-body motions often neglect the distinct roles of upper and lower body. This oversight leads to computationally intensive policy learning and frequently causes robot instability and falls during real-world execution. To address these issues, we propose Adversarial Locomotion and Motion Imitation (ALMI), a novel framework that enables adversarial policy learning between upper and lower body. Specifically, the lower body aims to provide robust locomotion capabilities to follow velocity commands while the upper body tracks various motions. Conversely, the upper-body policy ensures effective motion tracking when the robot executes velocity-based movements. Through iterative updates, these policies achieve coordinated whole-body control, which can be extended to loco-manipulation tasks with teleoperation systems. Extensive experiments demonstrate that our method achieves robust locomotion and precise motion tracking in both simulation and on the full-size Unitree H1 robot. Additionally, we release a large-scale whole-body motion control dataset featuring high-quality episodic trajectories from MuJoCo simulations deployable on real robots. The project page is https://almi-humanoid.github.io.

Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning

TL;DR

ALMI addresses the challenge of human-like whole-body coordination by decoupling learning for locomotion and motion imitation through an adversarial training framework. It introduces a dual-curriculum, two-player Markov-game approach and the ALMI-X dataset to support language-guided, end-to-end humanoid control and foundation-model research. Empirical results in simulation and on the Unitree H1-2 demonstrate improved robustness and tracking accuracy over baselines, and the ALMI-X dataset enables preliminary foundation-model exploration. The work also highlights promising avenues for end-to-end humanoid control while acknowledging limitations in highly dynamic tasks and sim-to-real data efficiency, suggesting future improvements in unified rewards and data-efficient foundation models.

Abstract

Humans exhibit diverse and expressive whole-body movements. However, attaining human-like whole-body coordination in humanoid robots remains challenging, as conventional approaches that mimic whole-body motions often neglect the distinct roles of upper and lower body. This oversight leads to computationally intensive policy learning and frequently causes robot instability and falls during real-world execution. To address these issues, we propose Adversarial Locomotion and Motion Imitation (ALMI), a novel framework that enables adversarial policy learning between upper and lower body. Specifically, the lower body aims to provide robust locomotion capabilities to follow velocity commands while the upper body tracks various motions. Conversely, the upper-body policy ensures effective motion tracking when the robot executes velocity-based movements. Through iterative updates, these policies achieve coordinated whole-body control, which can be extended to loco-manipulation tasks with teleoperation systems. Extensive experiments demonstrate that our method achieves robust locomotion and precise motion tracking in both simulation and on the full-size Unitree H1 robot. Additionally, we release a large-scale whole-body motion control dataset featuring high-quality episodic trajectories from MuJoCo simulations deployable on real robots. The project page is https://almi-humanoid.github.io.

Paper Structure

This paper contains 48 sections, 2 theorems, 29 equations, 10 figures, 15 tables, 1 algorithm.

Key Result

Theorem 3.1

Given $\epsilon>0$, suppose each policy has $\varepsilon$-greedy exploration scheme with factors of $\varepsilon_x \asymp \epsilon$ and $\varepsilon_x \asymp \epsilon^2$, under a specific two-timescale rule of the two players' learning-rate following the independent policy gradient, we have after $N$ episodes, which results in a $\epsilon$-approximate Nash equilibrium.

Figures (10)

  • Figure 1: The overview of ALMI. (a) In updating the upper-body policy $\pi^u$, we sample adversarial velocity command $\bm c^l_{\rm adv}$ and obtains $\bm a^l_{\rm adv}$. Then we use $(\bm a^u, \bm a^l_{\rm adv})$ to interact with the environment to collect experiences, which are used to update $\pi^u$ via PPO algorithm ppo. (b) Similarly, in updating the lower-body policy, we sample adversarial motion $\bm g^u_{\rm adv}$ and obtains $\bm a^u_{\rm adv}$. Then we use $(\bm a^l, \bm a^u_{\rm adv})$ to interact and update $\pi^l$. The two policies $(\pi^l,\pi^u)$ finally converge via multiple mutual iterations.
  • Figure 2: ALMI-X dataset and foundation model training. (a) ALMI-X features motion target and velocity command for the learned policies, combining with language description. (b) The foundation model learns $P(\hat{a}_{i+1}|s_{\leq i},a_{\leq i},\mathcal{T})$ from a segment of state-action pairs via causal attention. In inference, the last action is executed based on the history and obtains the true state for next steps.
  • Figure 3: The sim-to-real comparison of humanoid robot in tracking various motions.
  • Figure 4: Percentage of steps for different categories of motions before and after data augmentation.
  • Figure 5: The visualization of $x-y$ coordinates of the robot for each step in the dataset. We down-sample the data for visualization.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Theorem 3.1
  • Theorem A.2: Restate of Theorem 3.1
  • proof