Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

Xinyang Gu; Yen-Jen Wang; Xiang Zhu; Chengming Shi; Yanjiang Guo; Yichen Liu; Jianyu Chen

Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

Xinyang Gu, Yen-Jen Wang, Xiang Zhu, Chengming Shi, Yanjiang Guo, Yichen Liu, Jianyu Chen

TL;DR

This work presents Denoising World Model Learning (DWL), an end-to-end reinforcement learning framework for humanoid locomotion that addresses the sim-to-real gap with an encoder–decoder world model and domain randomization. DWL enables a single learned policy to master real-world terrains—snow, stairs, irregular surfaces—via zero-shot sim-to-real transfer and active 2-DoF ankle control with a Closed Kinematic Chain mechanism. The approach combines a denoising loss, PPO-based policy optimization, and privileged information during training, achieving robust gait across indoor and outdoor environments and under substantial disturbances. The findings demonstrate significant improvements in terrain adaptation, state estimation, and ankle-assisted stability, with practical implications for deploying humanoid robots in human-centric settings.

Abstract

Humanoid robots, with their human-like skeletal structure, are especially suited for tasks in human-centric environments. However, this structure is accompanied by additional challenges in locomotion controller design, especially in complex real-world environments. As a result, existing humanoid robots are limited to relatively simple terrains, either with model-based control or model-free reinforcement learning. In this work, we introduce Denoising World Model Learning (DWL), an end-to-end reinforcement learning framework for humanoid locomotion control, which demonstrates the world's first humanoid robot to master real-world challenging terrains such as snowy and inclined land in the wild, up and down stairs, and extremely uneven terrains. All scenarios run the same learned neural network with zero-shot sim-to-real transfer, indicating the superior robustness and generalization capability of the proposed method.

Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

TL;DR

Abstract

Paper Structure (35 sections, 8 equations, 12 figures, 8 tables)

This paper contains 35 sections, 8 equations, 12 figures, 8 tables.

Introduction
Related Works
Learning Robot Locomotion
Humanoid Robot Locomotion Control
Problem Setting
Reinforcement Learning Background
Humanoid Robot Hardware
Methods
Denoising World Model Learning
Encoder-Decoder Architecture of DWL
Policy Learning in DWL
Formulating the DWL Loss Function
Reward Formulation
Composition of Rewards
Quintic Polynomial Foot Trajectory Interpolation
...and 20 more sections

Figures (12)

Figure 1: Extensive showcase of locomotion skills using the proposed framework. Displayed is a sequence illustrating a humanoid robot skillfully executing various locomotion tasks in real world challenging environments.
Figure 2: Illustration of the humanoid robot's hardware structure and the Closed Kinematic Chain Ankle Mechanism. This mechanism is notable for offering two degrees of freedom in each ankle while reducing leg inertia. Our works are tested on two distinct sizes of humanoid robots, XBot-S and XBot-L, provided by Robot Era.
Figure 3: Illustration of the Denoising World Model Learning Framework. This diagram details the information flow from sensory input to action output in both simulated and real-world settings. Raw observations are generated by adding masking and DR noise to privileged observations. This is then encoded into a latent state and decoded to reconstruct the true state via a denoising process.
Figure 4: Dynamic adaptation of the ankle control mechanism. A) The top image demonstrates the humanoid robot's ankle control system actively maintaining balance on uneven terrain. The associated torque plot reveals the control system's adjustments during steady locomotion. B) The bottom image shows the system's resilience to external perturbations during static standing, where the 2-DoF ankle control plays a key role in maintaining stability.
Figure 5: State estimation results of DWL-facilitated complex terrain traversing and adaptation. This sequence of images visualizes the model's prediction of foot contact, base velocity, and heightmap when the humanoid robot navigates through a slope and stairs. The results demonstrate DWL's effectiveness in state estimation and online adaption.
...and 7 more figures

Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

TL;DR

Abstract

Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)