Learning Perceptive Humanoid Locomotion over Challenging Terrain
Wandong Sun, Baoshi Cao, Long Chen, Yongbo Su, Yang Liu, Zongwu Xie, Hong Liu
TL;DR
The paper addresses the fragility of humanoid locomotion on challenging terrains caused by reliance on proprioception and perception noise. It proposes Humanoid Perception Controller (HPC), a two-stage teacher–student framework where an oracle policy trained on privileged, noise-free data guides a student policy that learns a denoising world model using a variational information bottleneck and imitates the oracle via DAgger. Key contributions include integrating height-map–driven terrain perception with sensor denoising, a variational world model with ELBO optimization and annealing, domain randomization, and real-world validation showing robust traversal of varied outdoor terrains. The approach yields improved velocity tracking, terrain negotiation, and sustained performance under strong perception noise, enabling reliable outdoor humanoid operation without external intervention.
Abstract
Humanoid robots are engineered to navigate terrains akin to those encountered by humans, which necessitates human-like locomotion and perceptual abilities. Currently, the most reliable controllers for humanoid motion rely exclusively on proprioception, a reliance that becomes both dangerous and unreliable when coping with rugged terrain. Although the integration of height maps into perception can enable proactive gait planning, robust utilization of this information remains a significant challenge, especially when exteroceptive perception is noisy. To surmount these challenges, we propose a solution based on a teacher-student distillation framework. In this paradigm, an oracle policy accesses noise-free data to establish an optimal reference policy, while the student policy not only imitates the teacher's actions but also simultaneously trains a world model with a variational information bottleneck for sensor denoising and state estimation. Extensive evaluations demonstrate that our approach markedly enhances performance in scenarios characterized by unreliable terrain estimations. Moreover, we conducted rigorous testing in both challenging urban settings and off-road environments, the model successfully traverse 2 km of varied terrain without external intervention.
