Table of Contents
Fetching ...

Think on your feet: Seamless Transition between Human-like Locomotion in Response to Changing Commands

Huaxing Huang, Wenhao Cui, Tonghe Zhang, Shengtao Li, Jinchao Han, Bangyu Qin, Tianchu Zhang, Liang Zheng, Ziyang Tang, Chenxu Hu, Ning Yan, Jiahao Chen, Shipu Zhang, Zheyuan Jiang

TL;DR

This work addresses the problem of enabling seamless transitions between human-like locomotion in humanoid robots under changing velocity commands. It formulates locomotion control as a partially observable MDP and integrates a Hybrid Internal Model for velocity/state estimation, a Wasserstein-divergence discriminator to prevent mode collapse in imitation learning, and a curiosity bonus to encourage exploration, with motion retargeting from MoCap and domain randomization to bridge sim-to-real gaps. The approach yields a novel architecture that generalizes to unseen intermediate motions and transfers zero-shot from simulation to real hardware, validated across multiple terrains and robot platforms. The results demonstrate improved velocity tracking, more natural human-like gaits, and robust multitasking, highlighting practical potential for versatile humanoid locomotion in human-centric environments.

Abstract

While it is relatively easier to train humanoid robots to mimic specific locomotion skills, it is more challenging to learn from various motions and adhere to continuously changing commands. These robots must accurately track motion instructions, seamlessly transition between a variety of movements, and master intermediate motions not present in their reference data. In this work, we propose a novel approach that integrates human-like motion transfer with precise velocity tracking by a series of improvements to classical imitation learning. To enhance generalization, we employ the Wasserstein divergence criterion (WGAN-div). Furthermore, a Hybrid Internal Model provides structured estimates of hidden states and velocity to enhance mobile stability and environment adaptability, while a curiosity bonus fosters exploration. Our comprehensive method promises highly human-like locomotion that adapts to varying velocity requirements, direct generalization to unseen motions and multitasking, as well as zero-shot transfer to the simulator and the real world across different terrains. These advancements are validated through simulations across various robot models and extensive real-world experiments.

Think on your feet: Seamless Transition between Human-like Locomotion in Response to Changing Commands

TL;DR

This work addresses the problem of enabling seamless transitions between human-like locomotion in humanoid robots under changing velocity commands. It formulates locomotion control as a partially observable MDP and integrates a Hybrid Internal Model for velocity/state estimation, a Wasserstein-divergence discriminator to prevent mode collapse in imitation learning, and a curiosity bonus to encourage exploration, with motion retargeting from MoCap and domain randomization to bridge sim-to-real gaps. The approach yields a novel architecture that generalizes to unseen intermediate motions and transfers zero-shot from simulation to real hardware, validated across multiple terrains and robot platforms. The results demonstrate improved velocity tracking, more natural human-like gaits, and robust multitasking, highlighting practical potential for versatile humanoid locomotion in human-centric environments.

Abstract

While it is relatively easier to train humanoid robots to mimic specific locomotion skills, it is more challenging to learn from various motions and adhere to continuously changing commands. These robots must accurately track motion instructions, seamlessly transition between a variety of movements, and master intermediate motions not present in their reference data. In this work, we propose a novel approach that integrates human-like motion transfer with precise velocity tracking by a series of improvements to classical imitation learning. To enhance generalization, we employ the Wasserstein divergence criterion (WGAN-div). Furthermore, a Hybrid Internal Model provides structured estimates of hidden states and velocity to enhance mobile stability and environment adaptability, while a curiosity bonus fosters exploration. Our comprehensive method promises highly human-like locomotion that adapts to varying velocity requirements, direct generalization to unseen motions and multitasking, as well as zero-shot transfer to the simulator and the real world across different terrains. These advancements are validated through simulations across various robot models and extensive real-world experiments.

Paper Structure

This paper contains 18 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Comprehensive demonstration of Noetix robot N1's locomotion skills learnt from the proposed method. The robot exhibits seamless and continuous transfer between highly human-like motion sets, accelerating from walking to running then coming to a full stop. Top to down: performance in simulator Isaac gym, Mujoco, and the real world. Parameters of the PD controller and the driving frequency are consistent in the three groups.
  • Figure 2: Illustration of motion re-targeting from expert data (Left) to our humanoid robot "Noetix N1"(Right). N1 weights 23 kg and is of height ass 0.95 m, with 18 DoFs in total (four on each arm and five on each leg).
  • Figure 3: Illustration of the proposed human-like locomotion learning framework. The estimator extracts information from past observations, producing a velocity estimate with internal state representation. The velocity estimate ensures mobile stability, while contrastive learning promotes future observation prediction. Imitation learning secures human-like gaits, and a curiosity bonus fosters exploration. The bottom-left block is employed in real-world deployment.
  • Figure 4: Statistics of the style reward $r^{s}$ obtained during training indicate the diversity of joint positions in the output actions. The RL module HIM aids in capturing a broader reward distribution (Left) with a larger standard deviation (Right). Wasserstein divergence (AMPw) captures multiple peaks, a feature not provided by the vanilla AMP design. Also note that while curiosity bonus reduces deviation, it shifts the distribution towards a higher-reward region.
  • Figure 5: Training curves of four algorithms. Vanilla AMP (red curve) fails to obtain satisfactory task rewards due to its limited generalization. RL-aided methods consistently outperforms AMP baseline.
  • ...and 5 more figures