Beyond Egocentric Limits: Multi-View Depth-Based Learning for Robust Quadrupedal Locomotion
Rémy Rahem, Wael Suleiman
TL;DR
This work tackles the fragility of egocentric perception in dynamic quadrupedal locomotion by introducing a multi-view depth-based framework that fuses onboard and remote depth streams through a teacher-student distillation pipeline. A dual-depth policy is trained with privileged information in the teacher phase and distills to a robust student capable of handling remote-view dropout and misalignment via extensive domain randomization. Results show that multi-view policies outperform single-view baselines in challenging tasks and maintain stability when exocentric inputs are partially unavailable, with RD during training crucial for resilience. The approach supports aerial-ground cooperative sensing and enhances sim-to-real transfer, offering a practical path toward perception-rich, robust legged locomotion.
Abstract
Recent progress in legged locomotion has allowed highly dynamic and parkour-like behaviors for robots, similar to their biological counterparts. Yet, these methods mostly rely on egocentric (first-person) perception, limiting their performance, especially when the viewpoint of the robot is occluded. A promising solution would be to enhance the robot's environmental awareness by using complementary viewpoints, such as multiple actors exchanging perceptual information. Inspired by this idea, this work proposes a multi-view depth-based locomotion framework that combines egocentric and exocentric observations to provide richer environmental context during agile locomotion. Using a teacher-student distillation approach, the student policy learns to fuse proprioception with dual depth streams while remaining robust to real-world sensing imperfections. To further improve robustness, we introduce extensive domain randomization, including stochastic remote-camera dropouts and 3D positional perturbations that emulate aerial-ground cooperative sensing. Simulation results show that multi-viewpoints policies outperform single-viewpoint baseline in gap crossing, step descent, and other dynamic maneuvers, while maintaining stability when the exocentric camera is partially or completely unavailable. Additional experiments show that moderate viewpoint misalignment is well tolerated when incorporated during training. This study demonstrates that heterogeneous visual feedback improves robustness and agility in quadrupedal locomotion. Furthermore, to support reproducibility, the implementation accompanying this work is publicly available at https://anonymous.4open.science/r/multiview-parkour-6FB8
