Table of Contents
Fetching ...

Natural Humanoid Robot Locomotion with Generative Motion Prior

Haodong Zhang, Liang Zhang, Zhenghan Chen, Lu Chen, Yue Wang, Rong Xiong

TL;DR

This work tackles natural, human-like locomotion for humanoid robots by introducing a Generative Motion Prior (GMP) that offline-trains a CVAE to predict future, human-like whole-body motions based on current pose and velocity commands. During RL, GMP serves as a frozen online motion generator, supplying dense trajectory-level supervision (joint angles and keypoint positions) to guide policy learning with rewards that combine motion guidance, task performance, and stability. The approach relies on whole-body motion retargeting to create realistic robot reference motions, and uses a command-conditioned latent encoder to align generated trajectories with user commands. Experimental results in simulation and on a real NAVIAI humanoid demonstrate improved motion naturalness and stable training compared to baselines, indicating GMP's effectiveness for sim-to-real humanoid locomotion. The method offers a scalable, interpretable path to integrating human-like motion into robot locomotion through the combination of generative priors and reinforcement learning.

Abstract

Natural and lifelike locomotion remains a fundamental challenge for humanoid robots to interact with human society. However, previous methods either neglect motion naturalness or rely on unstable and ambiguous style rewards. In this paper, we propose a novel Generative Motion Prior (GMP) that provides fine-grained motion-level supervision for the task of natural humanoid robot locomotion. To leverage natural human motions, we first employ whole-body motion retargeting to effectively transfer them to the robot. Subsequently, we train a generative model offline to predict future natural reference motions for the robot based on a conditional variational auto-encoder. During policy training, the generative motion prior serves as a frozen online motion generator, delivering precise and comprehensive supervision at the trajectory level, including joint angles and keypoint positions. The generative motion prior significantly enhances training stability and improves interpretability by offering detailed and dense guidance throughout the learning process. Experimental results in both simulation and real-world environments demonstrate that our method achieves superior motion naturalness compared to existing approaches. Project page can be found at https://sites.google.com/view/humanoid-gmp

Natural Humanoid Robot Locomotion with Generative Motion Prior

TL;DR

This work tackles natural, human-like locomotion for humanoid robots by introducing a Generative Motion Prior (GMP) that offline-trains a CVAE to predict future, human-like whole-body motions based on current pose and velocity commands. During RL, GMP serves as a frozen online motion generator, supplying dense trajectory-level supervision (joint angles and keypoint positions) to guide policy learning with rewards that combine motion guidance, task performance, and stability. The approach relies on whole-body motion retargeting to create realistic robot reference motions, and uses a command-conditioned latent encoder to align generated trajectories with user commands. Experimental results in simulation and on a real NAVIAI humanoid demonstrate improved motion naturalness and stable training compared to baselines, indicating GMP's effectiveness for sim-to-real humanoid locomotion. The method offers a scalable, interpretable path to integrating human-like motion into robot locomotion through the combination of generative priors and reinforcement learning.

Abstract

Natural and lifelike locomotion remains a fundamental challenge for humanoid robots to interact with human society. However, previous methods either neglect motion naturalness or rely on unstable and ambiguous style rewards. In this paper, we propose a novel Generative Motion Prior (GMP) that provides fine-grained motion-level supervision for the task of natural humanoid robot locomotion. To leverage natural human motions, we first employ whole-body motion retargeting to effectively transfer them to the robot. Subsequently, we train a generative model offline to predict future natural reference motions for the robot based on a conditional variational auto-encoder. During policy training, the generative motion prior serves as a frozen online motion generator, delivering precise and comprehensive supervision at the trajectory level, including joint angles and keypoint positions. The generative motion prior significantly enhances training stability and improves interpretability by offering detailed and dense guidance throughout the learning process. Experimental results in both simulation and real-world environments demonstrate that our method achieves superior motion naturalness compared to existing approaches. Project page can be found at https://sites.google.com/view/humanoid-gmp

Paper Structure

This paper contains 16 sections, 13 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Illustration of different methods. (a) Pure RL only considers task objectives and neglects motion naturalness. (b) RL+AMP simultaneously trains the RL policy with the discriminator to provide an ambiguous scalar style reward and suffers from training instability. (c) Our method incorporates a frozen generative model to predict natural whole-body reference motion trajectories for the robot and provides more stable and detailed motion guidance.
  • Figure 2: Overall framework. (a) First, we transform the human motion dataset to the robot reference motion dataset by whole-body motion retargeting. (b) Utilizing the retargeted robot motions, we train a generative motion prior that can predict natural robot pose $\boldsymbol{m}_{t+1}$ at the next frame based on current robot pose $\boldsymbol{m}_{t}$ and the user velocity command $\boldsymbol{c}_{t}$. (c) The frozen generative motion prior performs online motion generation in an auto-regressive manner to synthesize the robot future reference motion and guide the RL policy to learn natural locomotion.
  • Figure 3: Visualization of future robot reference motion predicted by the generative motion prior, with green dots indicating the keypoint positions for the subsequent twelve frames. (a) Trajectories of the left knee and ankle alongside the right elbow and wrist. (b) Trajectories of the right knee and ankle together with the left elbow and wrist.
  • Figure 4: Qualitative comparison with representative baselines. The pure RL method (SaW van2024revisiting) overlooks motion naturalness, resulting in unnatural bent leg postures. The adversarial method (PBRS jeon2023benchmarking + AMP peng2021amp) learns straighter legs; however, it relies on ambiguous style rewards and is insufficient to fully capture human-like motion characteristics. In contrast, our method exhibits superior motion naturalness.
  • Figure 5: Snapshots of natural and human-like humanoid robot locomotion in the real-world environment.
  • ...and 1 more figures