MOVE: Multi-skill Omnidirectional Legged Locomotion with Limited View in 3D Environments
Songbo Li, Shixin Luo, Jun Wu, Qiuguo Zhu
TL;DR
The paper addresses omnidirectional legged locomotion for low-cost robots with limited egocentric vision. It introduces MOVE, a one-stage end-to-end framework built on PS-Net that fuses reconstruction and contrastive learning to infer unseen surroundings from a cube-map privileged representation, enabling robust motion across 3D terrains. The architecture comprises a standard input encoder, a surroundings encoder, a policy network, and a value network, with an asymmetric attention mechanism and a mixed supervision objective. Experimental results in simulation and on a real Lite3 robot demonstrate strong performance across forward and omnidirectional tasks (jumps, climbs, crawls) even under depth noise and partial occlusions, highlighting sim-to-real transfer. This work broadens the operational scope of egocentric-vision quadrupeds and provides a practical path toward real-time omnidirectional locomotion in challenging 3D environments.
Abstract
Legged robots possess inherent advantages in traversing complex 3D terrains. However, previous work on low-cost quadruped robots with egocentric vision systems has been limited by a narrow front-facing view and exteroceptive noise, restricting omnidirectional mobility in such environments. While building a voxel map through a hierarchical structure can refine exteroception processing, it introduces significant computational overhead, noise, and delays. In this paper, we present MOVE, a one-stage end-to-end learning framework capable of multi-skill omnidirectional legged locomotion with limited view in 3D environments, just like what a real animal can do. When movement aligns with the robot's line of sight, exteroceptive perception enhances locomotion, enabling extreme climbing and leaping. When vision is obstructed or the direction of movement lies outside the robot's field of view, the robot relies on proprioception for tasks like crawling and climbing stairs. We integrate all these skills into a single neural network by introducing a pseudo-siamese network structure combining supervised and contrastive learning which helps the robot infer its surroundings beyond its field of view. Experiments in both simulations and real-world scenarios demonstrate the robustness of our method, broadening the operational environments for robotics with egocentric vision.
