Table of Contents
Fetching ...

No More Blind Spots: Learning Vision-Based Omnidirectional Bipedal Locomotion for Challenging Terrain

Mohitvishnu S. Gadde, Pranay Dugar, Ashish Malik, Alan Fern

Abstract

Effective bipedal locomotion in dynamic environments, such as cluttered indoor spaces or uneven terrain, requires agile and adaptive movement in all directions. This necessitates omnidirectional terrain sensing and a controller capable of processing such input. We present a learning framework for vision-based omnidirectional bipedal locomotion, enabling seamless movement using depth images. A key challenge is the high computational cost of rendering omnidirectional depth images in simulation, making traditional sim-to-real reinforcement learning (RL) impractical. Our method combines a robust blind controller with a teacher policy that supervises a vision-based student policy, trained on noise-augmented terrain data to avoid rendering costs during RL and ensure robustness. We also introduce a data augmentation technique for supervised student training, accelerating training by up to 10 times compared to conventional methods. Our framework is validated through simulation and real-world tests, demonstrating effective omnidirectional locomotion with minimal reliance on expensive rendering. This is, to the best of our knowledge, the first demonstration of vision-based omnidirectional bipedal locomotion, showcasing its adaptability to diverse terrains.

No More Blind Spots: Learning Vision-Based Omnidirectional Bipedal Locomotion for Challenging Terrain

Abstract

Effective bipedal locomotion in dynamic environments, such as cluttered indoor spaces or uneven terrain, requires agile and adaptive movement in all directions. This necessitates omnidirectional terrain sensing and a controller capable of processing such input. We present a learning framework for vision-based omnidirectional bipedal locomotion, enabling seamless movement using depth images. A key challenge is the high computational cost of rendering omnidirectional depth images in simulation, making traditional sim-to-real reinforcement learning (RL) impractical. Our method combines a robust blind controller with a teacher policy that supervises a vision-based student policy, trained on noise-augmented terrain data to avoid rendering costs during RL and ensure robustness. We also introduce a data augmentation technique for supervised student training, accelerating training by up to 10 times compared to conventional methods. Our framework is validated through simulation and real-world tests, demonstrating effective omnidirectional locomotion with minimal reliance on expensive rendering. This is, to the best of our knowledge, the first demonstration of vision-based omnidirectional bipedal locomotion, showcasing its adaptability to diverse terrains.

Paper Structure

This paper contains 17 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The omni-direction robot locomotion controller is trained in simulation via student-teacher learning to proactively adjust the gait to challenging terrain conditions based on the input from 4 depth cameras. This controller is then transferred to the real robot.
  • Figure 2: Hierarchical network architecture for proposed vision-based omnidirectional locomotion control. (A) The policy consists of a frozen pretrained blind policy that outputs a base action, and a trainable modulator conditioned on perception to produce a modulating action. The final action is the sum of both outputs. (B) The teacher model uses privileged height map encoded by an MLP. (C) The student model uses egocentric multi-view images processed by a ResNet-18-based encoder.
  • Figure 3: Various types of terrains used in training
  • Figure 4: Bar plots comparing four policy variants: Blind Policy, Blind Policy + Terrain, Privileged Teacher, and Ours (Student) across four terrain types (easy, ridge-hard, stair-hard, block-hard). The plots show (1) Success Rate, defined as completing 10-second rollouts without falling; (2) Episodes with Collision, which count any episode with foot or conrod collision events; (3) Terminations due to Foot Collision, measured by checking if foot or conrod collisions occurred in the final 100 steps of terminated episodes; and (4) Energy Consumption, reported as average cumulative joint effort in kilojoules. All metrics are averaged over 100 episodes for each variant of terrain with 95% confidence intervals.