Table of Contents
Fetching ...

Learning Perceptive Humanoid Locomotion over Challenging Terrain

Wandong Sun, Baoshi Cao, Long Chen, Yongbo Su, Yang Liu, Zongwu Xie, Hong Liu

TL;DR

The paper addresses the fragility of humanoid locomotion on challenging terrains caused by reliance on proprioception and perception noise. It proposes Humanoid Perception Controller (HPC), a two-stage teacher–student framework where an oracle policy trained on privileged, noise-free data guides a student policy that learns a denoising world model using a variational information bottleneck and imitates the oracle via DAgger. Key contributions include integrating height-map–driven terrain perception with sensor denoising, a variational world model with ELBO optimization and annealing, domain randomization, and real-world validation showing robust traversal of varied outdoor terrains. The approach yields improved velocity tracking, terrain negotiation, and sustained performance under strong perception noise, enabling reliable outdoor humanoid operation without external intervention.

Abstract

Humanoid robots are engineered to navigate terrains akin to those encountered by humans, which necessitates human-like locomotion and perceptual abilities. Currently, the most reliable controllers for humanoid motion rely exclusively on proprioception, a reliance that becomes both dangerous and unreliable when coping with rugged terrain. Although the integration of height maps into perception can enable proactive gait planning, robust utilization of this information remains a significant challenge, especially when exteroceptive perception is noisy. To surmount these challenges, we propose a solution based on a teacher-student distillation framework. In this paradigm, an oracle policy accesses noise-free data to establish an optimal reference policy, while the student policy not only imitates the teacher's actions but also simultaneously trains a world model with a variational information bottleneck for sensor denoising and state estimation. Extensive evaluations demonstrate that our approach markedly enhances performance in scenarios characterized by unreliable terrain estimations. Moreover, we conducted rigorous testing in both challenging urban settings and off-road environments, the model successfully traverse 2 km of varied terrain without external intervention.

Learning Perceptive Humanoid Locomotion over Challenging Terrain

TL;DR

The paper addresses the fragility of humanoid locomotion on challenging terrains caused by reliance on proprioception and perception noise. It proposes Humanoid Perception Controller (HPC), a two-stage teacher–student framework where an oracle policy trained on privileged, noise-free data guides a student policy that learns a denoising world model using a variational information bottleneck and imitates the oracle via DAgger. Key contributions include integrating height-map–driven terrain perception with sensor denoising, a variational world model with ELBO optimization and annealing, domain randomization, and real-world validation showing robust traversal of varied outdoor terrains. The approach yields improved velocity tracking, terrain negotiation, and sustained performance under strong perception noise, enabling reliable outdoor humanoid operation without external intervention.

Abstract

Humanoid robots are engineered to navigate terrains akin to those encountered by humans, which necessitates human-like locomotion and perceptual abilities. Currently, the most reliable controllers for humanoid motion rely exclusively on proprioception, a reliance that becomes both dangerous and unreliable when coping with rugged terrain. Although the integration of height maps into perception can enable proactive gait planning, robust utilization of this information remains a significant challenge, especially when exteroceptive perception is noisy. To surmount these challenges, we propose a solution based on a teacher-student distillation framework. In this paradigm, an oracle policy accesses noise-free data to establish an optimal reference policy, while the student policy not only imitates the teacher's actions but also simultaneously trains a world model with a variational information bottleneck for sensor denoising and state estimation. Extensive evaluations demonstrate that our approach markedly enhances performance in scenarios characterized by unreliable terrain estimations. Moreover, we conducted rigorous testing in both challenging urban settings and off-road environments, the model successfully traverse 2 km of varied terrain without external intervention.

Paper Structure

This paper contains 24 sections, 9 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Deployment to outdoor environments. We deployed the model in outdoor challenging terrains. Our controller can successfully traverse a range of terrains, including stair, discrete, rough, gravel, sloping, and deep snow terrains. Videos are available at https://www.youtube.com/watch?v=-47gm15wbYA.
  • Figure 2: Training of Humanoid Perception Controller consists of two stages: (1) Oracle Policy Training generates reference policy using noise-free privileged data, (2) Student Policy Training employs a world model with variational information bottleneck for sensor denoising while imitating oracle actions through teacher-student distillation. During deployment, only the encoder and policy network are retaind for real-world execution.
  • Figure 3: An intuitive display of terrain noise, where the red dots are the actual terrain heights and the green dots are the terrain heights after adding noise.
  • Figure 4: The robot is trained over a variety of terrains.
  • Figure 5: Real-world stair ascent comparison between our approach (top) and baseline (bottom). Our approach successfully overcomes high levels of noise and scales multiple levels, while the baseline controller suffers from catastrophic failure.