Table of Contents
Fetching ...

VILENS: Visual, Inertial, Lidar, and Leg Odometry for All-Terrain Legged Robots

David Wisth, Marco Camurri, Maurice Fallon

TL;DR

VILENS addresses the critical problem of robust localization for legged robots operating in challenging environments where individual sensors can degenerate. It introduces a factor-graph based framework that tightly fuses IMU, leg kinematics, vision, and lidar data, augmented by a novel velocity-bias factor to compensate for contact deformation and slippage. The approach yields substantial accuracy gains over loosely coupled baselines across diverse platforms and terrains, and demonstrates robustness in dark, slippery, and feature-poor environments, while functioning onboard with a fixed-lag smoother integrated with a perceptive controller and local planner. The results show high-frequency, low-drift state estimates suitable for terrain mapping and autonomous navigation, marking a practical step toward reliable all-terrain legged autonomy.

Abstract

We present visual inertial lidar legged navigation system (VILENS), an odometry system for legged robots based on factor graphs. The key novelty is the tight fusion of four different sensor modalities to achieve reliable operation when the individual sensors would otherwise produce degenerate estimation. To minimize leg odometry drift, we extend the robot's state with a linear velocity bias term, which is estimated online. This bias is observable because of the tight fusion of this preintegrated velocity factor with vision, lidar, and inertial measurement unit (IMU) factors. Extensive experimental validation on different ANYmal quadruped robots is presented, for a total duration of 2 h and 1.8 km traveled. The experiments involved dynamic locomotion over loose rocks, slopes, and mud, which caused challenges such as slippage and terrain deformation. Perceptual challenges included dark and dusty underground caverns, and open and feature-deprived areas. We show an average improvement of 62% translational and 51% rotational errors compared to a state-of-the-art loosely coupled approach. To demonstrate its robustness, VILENS was also integrated with a perceptive controller and a local path planner.

VILENS: Visual, Inertial, Lidar, and Leg Odometry for All-Terrain Legged Robots

TL;DR

VILENS addresses the critical problem of robust localization for legged robots operating in challenging environments where individual sensors can degenerate. It introduces a factor-graph based framework that tightly fuses IMU, leg kinematics, vision, and lidar data, augmented by a novel velocity-bias factor to compensate for contact deformation and slippage. The approach yields substantial accuracy gains over loosely coupled baselines across diverse platforms and terrains, and demonstrates robustness in dark, slippery, and feature-poor environments, while functioning onboard with a fixed-lag smoother integrated with a perceptive controller and local planner. The results show high-frequency, low-drift state estimates suitable for terrain mapping and autonomous navigation, marking a practical step toward reliable all-terrain legged autonomy.

Abstract

We present visual inertial lidar legged navigation system (VILENS), an odometry system for legged robots based on factor graphs. The key novelty is the tight fusion of four different sensor modalities to achieve reliable operation when the individual sensors would otherwise produce degenerate estimation. To minimize leg odometry drift, we extend the robot's state with a linear velocity bias term, which is estimated online. This bias is observable because of the tight fusion of this preintegrated velocity factor with vision, lidar, and inertial measurement unit (IMU) factors. Extensive experimental validation on different ANYmal quadruped robots is presented, for a total duration of 2 h and 1.8 km traveled. The experiments involved dynamic locomotion over loose rocks, slopes, and mud, which caused challenges such as slippage and terrain deformation. Perceptual challenges included dark and dusty underground caverns, and open and feature-deprived areas. We show an average improvement of 62% translational and 51% rotational errors compared to a state-of-the-art loosely coupled approach. To demonstrate its robustness, VILENS was also integrated with a perceptive controller and a local path planner.

Paper Structure

This paper contains 51 sections, 49 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: VILENS has been tested on a variety of platforms. Top-Left: ANYmal B300 at the Fire Service College in Moreton-On-Marsh (UK). Top-Right: ANYmal B300 modified for the DARPA SubT Challenge (Urban Circuit) in Olympia (Washington, USA) Tranzatto2021cerberus. Bottom-Left: ANYmal C100 in a limescale mine in Wiltshire (UK). Bottom-Right:ANYmal C100 in an abandoned mine in Seemühle (Switzerland).
  • Figure 2: Comparison between estimated robot elevation by Pronto Camurri2017 (red) and TSIF Bloesch2017tsif (purple) kinematic-inertial state estimators, against ground truth (green) on the SMR experiment (see Section \ref{['sec:datasets']}). Despite local fluctuations, the drift has a characteristic linear growth for a particular gait and terrain type. For example, between 350s and 450s the robot walks over soft gravel, increasing the drift rate.
  • Figure 3: An example of a foot contact sequence while trotting on gravel. After the foot touches the ground (a), both the terrain and robot's rubber foot deform as the controller increases the applied force (b). During the stance phase, the contact point changes as the foot rolls over its hemispherical profile (c) before finally breaking contact (d).
  • Figure 4: Reference frames and landmark conventions. The world frame $\mathtt{W}$ is fixed to Earth. The base frame $\mathtt{{B}}$, the camera's optical frame $\mathtt{C}$, the lidar frame $\mathtt{L}$, and the IMU frame $\mathtt{I}$ are all rigidly attached to the robot's chassis. The feet are conventionally named: Left Front (LF), Right Front (RF), Left Hind (LH), and Right Hind (RH). When a foot touches the ground (e.g., RF), a contact frame $\mathtt{K}$ (perpendicular to the ground and parallel to $\mathtt{W}$'s $y$-axis) is defined. The primitives tracked by the system are points $\boldsymbol{m}$, lines $\boldsymbol{l}$, and planes $\boldsymbol{p}$. To improve numerical stability, when a new plane feature is detected, an additional local fixed anchor frame $\mathtt{{A}}$ is defined. Finally, a relative pose factor (between times $t_i$ and $t_m$ in figure) is created using lidar registration.
  • Figure 5: VILENS factor graph structure. The factors are: prior (black), visual (yellow), lidar planes (green), lidar lines (red), preintegrated IMU (orange), preintegrated velocity (from leg kinematics, blue), and lidar odometry (from ICP registration, magenta). State nodes are white, while landmarks are grey.
  • ...and 13 more figures