Table of Contents
Fetching ...

KiVi: Kinesthetic-Visuospatial Integration for Dynamic and Safe Egocentric Legged Locomotion

Peizhuo Li, Hongyi Li, Yuxuan Ma, Linnan Chang, Xinrong Yang, Ruiqi Yu, Yifeng Zhang, Yuhong Cao, Qiuguo Zhu, Guillaume Sartoretti

TL;DR

KiVi addresses the fragility of vision-based legged locomotion by explicitly separating proprioceptive and visual pathways and enriching their fusion with a memory-augmented transformer. The framework uses an asymmetric actor–critic with a Kinesthetic Module and a Visuospatial Module to provide stable control while selectively leveraging vision for obstacle avoidance and terrain understanding, even under out-of-distribution visual disturbances. Empirical results show robust sim-to-real transfer, strong performance on diverse outdoor terrains, and graceful fallback to proprioception when vision is unreliable. This approach offers a practical, robust solution for real-world legged locomotion in visually challenging environments.

Abstract

Vision-based locomotion has shown great promise in enabling legged robots to perceive and adapt to complex environments. However, visual information is inherently fragile, being vulnerable to occlusions, reflections, and lighting changes, which often cause instability in locomotion. Inspired by animal sensorimotor integration, we propose KiVi, a Kinesthetic-Visuospatial integration framework, where kinesthetics encodes proprioceptive sensing of body motion and visuospatial reasoning captures visual perception of surrounding terrain. Specifically, KiVi separates these pathways, leveraging proprioception as a stable backbone while selectively incorporating vision for terrain awareness and obstacle avoidance. This modality-balanced, yet integrative design, combined with memory-enhanced attention, allows the robot to robustly interpret visual cues while maintaining fallback stability through proprioception. Extensive experiments show that our method enables quadruped robots to stably traverse diverse terrains and operate reliably in unstructured outdoor environments, remaining robust to out-of-distribution (OOD) visual noise and occlusion unseen during training, thereby highlighting its effectiveness and applicability to real-world legged locomotion.

KiVi: Kinesthetic-Visuospatial Integration for Dynamic and Safe Egocentric Legged Locomotion

TL;DR

KiVi addresses the fragility of vision-based legged locomotion by explicitly separating proprioceptive and visual pathways and enriching their fusion with a memory-augmented transformer. The framework uses an asymmetric actor–critic with a Kinesthetic Module and a Visuospatial Module to provide stable control while selectively leveraging vision for obstacle avoidance and terrain understanding, even under out-of-distribution visual disturbances. Empirical results show robust sim-to-real transfer, strong performance on diverse outdoor terrains, and graceful fallback to proprioception when vision is unreliable. This approach offers a practical, robust solution for real-world legged locomotion in visually challenging environments.

Abstract

Vision-based locomotion has shown great promise in enabling legged robots to perceive and adapt to complex environments. However, visual information is inherently fragile, being vulnerable to occlusions, reflections, and lighting changes, which often cause instability in locomotion. Inspired by animal sensorimotor integration, we propose KiVi, a Kinesthetic-Visuospatial integration framework, where kinesthetics encodes proprioceptive sensing of body motion and visuospatial reasoning captures visual perception of surrounding terrain. Specifically, KiVi separates these pathways, leveraging proprioception as a stable backbone while selectively incorporating vision for terrain awareness and obstacle avoidance. This modality-balanced, yet integrative design, combined with memory-enhanced attention, allows the robot to robustly interpret visual cues while maintaining fallback stability through proprioception. Extensive experiments show that our method enables quadruped robots to stably traverse diverse terrains and operate reliably in unstructured outdoor environments, remaining robust to out-of-distribution (OOD) visual noise and occlusion unseen during training, thereby highlighting its effectiveness and applicability to real-world legged locomotion.

Paper Structure

This paper contains 23 sections, 5 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Robust locomotion and obstacle avoidance of Deeprobotics Lite3 across diverse terrains and under severe visual disturbances, achieved using our proposed KiVi framework.
  • Figure 2: Overview of the KiVi framework. The bio-inspired dual-branch estimator consists of the Kinesthetic Module (highlighted in yellow) and the Visuospatial Module (highlighted in gray), focusing on proprioceptive information and the integration of visual inputs, respectively. Solid lines indicate components that are deployed on the real robot, while dashed lines denote parts used only during simulation training. Red lines represent gradient blocking between modules during training.
  • Figure 3: Simulated terrain types A–F used during training, each representing a distinct terrain challenge. As the training difficulty increases, each terrain randomly generates 0–5 obstacles. Figure G lists the control parameters and their respective ranges for each terrain type, which are used to procedurally generate diverse terrain instances.
  • Figure 4: Total joint power and power variance across all joints for KiVi, KiVi w/o Kin., and Himloco on simulated rough terrains under severe visual disturbances.
  • Figure 5: Outdoor hardware experiments under low visual disturbances, including tree roots, staircases, elevated platforms, and dynamic pedestrians. With only a constant forward velocity command of $[1.0, 0, 0]$, the robot traversed all terrains and avoided obstacles.
  • ...and 2 more figures