Table of Contents
Fetching ...

Learning Humanoid Locomotion with World Model Reconstruction

Wandong Sun, Long Chen, Yongbo Su, Baoshi Cao, Yang Liu, Zongwu Xie

TL;DR

This work tackles robust humanoid locomotion on real-world terrains by introducing World Model Reconstruction (WMR), which explicitly reconstructs world state from noisy sensor histories and uses it as the sole input to the locomotion policy. The estimator, value network, and policy are trained jointly with a gradient-cutoff that prevents policy updates from biasing the world-state reconstruction, while a motion-capture-derived command space guides natural, human-like commands. Real-world deployment over ice, snow, and deformable ground demonstrates 3.2 km of autonomous traversal with strong resistance to perturbations and low-friction conditions, outperforming baselines in reconstruction and episode metrics (e.g., $E_{vel}$, $E_{ang}$, $E_{recon}$, $M_{terrain}$, $M_{reward}$). Key contributions include the first explicit world-state reconstruction framework for humanoid locomotion, a simple yet effective gradient-cutoff mechanism that enhances reconstruction accuracy by ~$40\%$, and the use of a motion-capture command space to improve command tracking. The findings indicate that WMR enables robust sim-to-real transfer and superior performance across challenging terrains, with potential extensions to terrain height perception and broader real-world deployment.

Abstract

Humanoid robots are designed to navigate environments accessible to humans using their legs. However, classical research has primarily focused on controlled laboratory settings, resulting in a gap in developing controllers for navigating complex real-world terrains. This challenge mainly arises from the limitations and noise in sensor data, which hinder the robot's understanding of itself and the environment. In this study, we introduce World Model Reconstruction (WMR), an end-to-end learning-based approach for blind humanoid locomotion across challenging terrains. We propose training an estimator to explicitly reconstruct the world state and utilize it to enhance the locomotion policy. The locomotion policy takes inputs entirely from the reconstructed information. The policy and the estimator are trained jointly; however, the gradient between them is intentionally cut off. This ensures that the estimator focuses solely on world reconstruction, independent of the locomotion policy's updates. We evaluated our model on rough, deformable, and slippery surfaces in real-world scenarios, demonstrating robust adaptability and resistance to interference. The robot successfully completed a 3.2 km hike without any human assistance, mastering terrains covered with ice and snow.

Learning Humanoid Locomotion with World Model Reconstruction

TL;DR

This work tackles robust humanoid locomotion on real-world terrains by introducing World Model Reconstruction (WMR), which explicitly reconstructs world state from noisy sensor histories and uses it as the sole input to the locomotion policy. The estimator, value network, and policy are trained jointly with a gradient-cutoff that prevents policy updates from biasing the world-state reconstruction, while a motion-capture-derived command space guides natural, human-like commands. Real-world deployment over ice, snow, and deformable ground demonstrates 3.2 km of autonomous traversal with strong resistance to perturbations and low-friction conditions, outperforming baselines in reconstruction and episode metrics (e.g., , , , , ). Key contributions include the first explicit world-state reconstruction framework for humanoid locomotion, a simple yet effective gradient-cutoff mechanism that enhances reconstruction accuracy by ~, and the use of a motion-capture command space to improve command tracking. The findings indicate that WMR enables robust sim-to-real transfer and superior performance across challenging terrains, with potential extensions to terrain height perception and broader real-world deployment.

Abstract

Humanoid robots are designed to navigate environments accessible to humans using their legs. However, classical research has primarily focused on controlled laboratory settings, resulting in a gap in developing controllers for navigating complex real-world terrains. This challenge mainly arises from the limitations and noise in sensor data, which hinder the robot's understanding of itself and the environment. In this study, we introduce World Model Reconstruction (WMR), an end-to-end learning-based approach for blind humanoid locomotion across challenging terrains. We propose training an estimator to explicitly reconstruct the world state and utilize it to enhance the locomotion policy. The locomotion policy takes inputs entirely from the reconstructed information. The policy and the estimator are trained jointly; however, the gradient between them is intentionally cut off. This ensures that the estimator focuses solely on world reconstruction, independent of the locomotion policy's updates. We evaluated our model on rough, deformable, and slippery surfaces in real-world scenarios, demonstrating robust adaptability and resistance to interference. The robot successfully completed a 3.2 km hike without any human assistance, mastering terrains covered with ice and snow.

Paper Structure

This paper contains 32 sections, 7 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Deployment to outdoor environments. We deployed the model in an outdoor environment covered in ice and snow. Our controller can successfully traverse a range of terrains, including rough, gravel, sloping, and deep snow terrains. It can also resist impact from humans and maintain stability on low-friction surfaces.
  • Figure 2: Illustration of the World Model Reconstruction framework. Our framework explicitly reconstructs world state from noisy sensor history and uses it as the sole input to the locomotion policy. The framework is driven by reconstruction loss, value function loss and policy gradient, innovatively applies gradient cutoff between the estimator and the locomotion policy to enhance the reconstruction accuracy. After one stage of training procedure, the robot can perform zero-shot sim-to-real transfer to challenging terrains.
  • Figure 3: The robot is trained over a variety of terrains with random friction and restitution.
  • Figure 4: We eval the baselines on the reconstruction accuracy of the world state. The radar plot shows the mean reconstruction accuracy of each metric. The green area represents the WMR framework, the blue area represents the Denoising World Model Learning framework, and the purple area represents the WMR framework without gradient cutoff.
  • Figure 5: Comparison of ground truth and reconstruction. The ground truth data comes from the noise-free data in the simulator. The WMR framework receives the noisy data and predicts the world state.
  • ...and 3 more figures