Table of Contents
Fetching ...

Voxel-SLAM: A Complete, Accurate, and Versatile LiDAR-Inertial SLAM System

Zheng Liu, Haotian Li, Chongjian Yuan, Xiyuan Liu, Jiarong Lin, Rundong Li, Chunran Zheng, Bingyang Zhou, Wenyi Liu, Fu Zhang

TL;DR

Voxel-SLAM tackles the challenge of robust, real-time LiDAR-Inertial SLAM across single and multi-session environments by unifying all modules under an adaptive voxel map and exploiting four data associations: short-term, mid-term, long-term, and multi-map. The system couples a BALM2-based LiDAR-inertial BA with an efficient three-level data pyramid to support initialization, odometry, local mapping, loop closure, and global mapping in real time. Key contributions include robust initialization from highly dynamic starts, a sliding-window LiDAR-Inertial BA for local refinement, loop closure across multiple sessions with a unified map, and a hierarchical global BA to ensure global consistency. The approach demonstrates state-of-the-art accuracy on diverse datasets (Hilti, MARS-LVIG, UrbanNav) and maintains real-time performance on resource-constrained hardware, with explicit support for multi-session relocalization and online global map refinement.

Abstract

In this work, we present Voxel-SLAM: a complete, accurate, and versatile LiDAR-inertial SLAM system that fully utilizes short-term, mid-term, long-term, and multi-map data associations to achieve real-time estimation and high precision mapping. The system consists of five modules: initialization, odometry, local mapping, loop closure, and global mapping, all employing the same map representation, an adaptive voxel map. The initialization provides an accurate initial state estimation and a consistent local map for subsequent modules, enabling the system to start with a highly dynamic initial state. The odometry, exploiting the short-term data association, rapidly estimates current states and detects potential system divergence. The local mapping, exploiting the mid-term data association, employs a local LiDAR-inertial bundle adjustment (BA) to refine the states (and the local map) within a sliding window of recent LiDAR scans. The loop closure detects previously visited places in the current and all previous sessions. The global mapping refines the global map with an efficient hierarchical global BA. The loop closure and global mapping both exploit long-term and multi-map data associations. We conducted a comprehensive benchmark comparison with other state-of-the-art methods across 30 sequences from three representative scenes, including narrow indoor environments using hand-held equipment, large-scale wilderness environments with aerial robots, and urban environments on vehicle platforms. Other experiments demonstrate the robustness and efficiency of the initialization, the capacity to work in multiple sessions, and relocalization in degenerated environments.

Voxel-SLAM: A Complete, Accurate, and Versatile LiDAR-Inertial SLAM System

TL;DR

Voxel-SLAM tackles the challenge of robust, real-time LiDAR-Inertial SLAM across single and multi-session environments by unifying all modules under an adaptive voxel map and exploiting four data associations: short-term, mid-term, long-term, and multi-map. The system couples a BALM2-based LiDAR-inertial BA with an efficient three-level data pyramid to support initialization, odometry, local mapping, loop closure, and global mapping in real time. Key contributions include robust initialization from highly dynamic starts, a sliding-window LiDAR-Inertial BA for local refinement, loop closure across multiple sessions with a unified map, and a hierarchical global BA to ensure global consistency. The approach demonstrates state-of-the-art accuracy on diverse datasets (Hilti, MARS-LVIG, UrbanNav) and maintains real-time performance on resource-constrained hardware, with explicit support for multi-session relocalization and online global map refinement.

Abstract

In this work, we present Voxel-SLAM: a complete, accurate, and versatile LiDAR-inertial SLAM system that fully utilizes short-term, mid-term, long-term, and multi-map data associations to achieve real-time estimation and high precision mapping. The system consists of five modules: initialization, odometry, local mapping, loop closure, and global mapping, all employing the same map representation, an adaptive voxel map. The initialization provides an accurate initial state estimation and a consistent local map for subsequent modules, enabling the system to start with a highly dynamic initial state. The odometry, exploiting the short-term data association, rapidly estimates current states and detects potential system divergence. The local mapping, exploiting the mid-term data association, employs a local LiDAR-inertial bundle adjustment (BA) to refine the states (and the local map) within a sliding window of recent LiDAR scans. The loop closure detects previously visited places in the current and all previous sessions. The global mapping refines the global map with an efficient hierarchical global BA. The loop closure and global mapping both exploit long-term and multi-map data associations. We conducted a comprehensive benchmark comparison with other state-of-the-art methods across 30 sequences from three representative scenes, including narrow indoor environments using hand-held equipment, large-scale wilderness environments with aerial robots, and urban environments on vehicle platforms. Other experiments demonstrate the robustness and efficiency of the initialization, the capacity to work in multiple sessions, and relocalization in degenerated environments.

Paper Structure

This paper contains 41 sections, 9 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: The overview of Voxel-SLAM. The green parts are the different modules of Voxel-SLAM consisting of initialization, odometry, local mapping, loop closure and global mapping. The modules in the same gray dashed box are run in the same thread. The bottom blue parts are the data pyramid of multiple sessions and the center red part is the adaptive voxel map.
  • Figure 2: The factor graph representation of the proposed LiDAR-inertial bundle adjustment.
  • Figure 3: The factor graph representation of the LiDAR-inertial bundle adjustment used in local mapping. The "identity matrix" represents the map constraints from the "fix" points (represented in the form of point cluster), which are out of the sliding window optimization and are represented in the world frame.
  • Figure 4: Drifting and traveling distances. (a) The candidate loop keyframe is in the current session. The traveling distance is the distance accumulated along the current session from the candidate loop keyframe B to the current keyframe A. (b) The candidate loop keyframe is in a previous session. The traveling distance is the distance accumulated along the previous session from the candidate loop keyframe B to the previous loop keyframe C and then along the current session from C to the current keyframe A. For both cases, the drifting distance is the relative distance between the current keyframe A and the candidate loop frame B, which is computed from the loop detection method BTC yuan2024btc.
  • Figure 5: The scans before and after a PGO. The red nodes are the scans that have been added to the pose graph. The green nodes are the marginalized scans from the local mapping during the loop closure execution, hence having not been added to the pose graph yet. The blue nodes are the scans in the current sliding window of the local mapping. The nodes at the top and bottom are the scans before and after PGO, respectively. $\Delta \mathbf T_\text{p}$ is the pose correction of the last scan in the pose graph and used to correct the subsequent scan beyond the pose graph.
  • ...and 10 more figures