Learning Humanoid Locomotion with Perceptive Internal Model
Junfeng Long, Junli Ren, Moji Shi, Zirui Wang, Tao Huang, Ping Luo, Jiangmiao Pang
TL;DR
Humanoid locomotion requires perception to cope with unstable morphology and diverse terrains, making perception-free policies insufficient. The authors propose Perceptive Internal Model (PIM), which uses a robot-centered LiDAR elevation map and terrain-aware state estimation integrated with a Hybrid Internal Model (HIM) to train locomotion policies. Training is fast and efficient in simulation, enabling zero-shot transfer to hardware (reported around 3 hours on an RTX 4090), with robust performance on stairs, gaps, and platforms across two humanoid platforms (Unitree H1 and Fourier GR-1). The approach demonstrates high stability and generalization, including cross-platform validation and natural whole-body movement, offering a scalable foundation for perceptive humanoid control.
Abstract
In contrast to quadruped robots that can navigate diverse terrains using a "blind" policy, humanoid robots require accurate perception for stable locomotion due to their high degrees of freedom and inherently unstable morphology. However, incorporating perceptual signals often introduces additional disturbances to the system, potentially reducing its robustness, generalizability, and efficiency. This paper presents the Perceptive Internal Model (PIM), which relies on onboard, continuously updated elevation maps centered around the robot to perceive its surroundings. We train the policy using ground-truth obstacle heights surrounding the robot in simulation, optimizing it based on the Hybrid Internal Model (HIM), and perform inference with heights sampled from the constructed elevation map. Unlike previous methods that directly encode depth maps or raw point clouds, our approach allows the robot to perceive the terrain beneath its feet clearly and is less affected by camera movement or noise. Furthermore, since depth map rendering is not required in simulation, our method introduces minimal additional computational costs and can train the policy in 3 hours on an RTX 4090 GPU. We verify the effectiveness of our method across various humanoid robots, various indoor and outdoor terrains, stairs, and various sensor configurations. Our method can enable a humanoid robot to continuously climb stairs and has the potential to serve as a foundational algorithm for the development of future humanoid control methods.
