Table of Contents
Fetching ...

Uncertainty-Aware Visual-Inertial SLAM with Volumetric Occupancy Mapping

Jaehyung Jung, Simon Boche, Sebastián Barbas Laina, Stefan Leutenegger

TL;DR

This work tackles robust VI-SLAM with dense occupancy mapping by making depth fusion and map construction uncertainty-aware. It fuses depth predictions from stereo and MVS networks using per-pixel uncertainty estimates, then integrates these depths into both occupancy maps and the VI-SLAM factor graph via occupancy-to-point factors. The method introduces a tightly coupled, probabilistic framework where depth uncertainty propagates through depth fusion, occupancy updates, and estimator optimization, yielding globally consistent submaps and real-time dense occupancy. Experimental results on EuRoC and Hilti-Oxford show state-of-the-art localization and mapping accuracy, while delivering real-time volumetric occupancy suitable for planning and control.

Abstract

We propose visual-inertial simultaneous localization and mapping that tightly couples sparse reprojection errors, inertial measurement unit pre-integrals, and relative pose factors with dense volumetric occupancy mapping. Hereby depth predictions from a deep neural network are fused in a fully probabilistic manner. Specifically, our method is rigorously uncertainty-aware: first, we use depth and uncertainty predictions from a deep network not only from the robot's stereo rig, but we further probabilistically fuse motion stereo that provides depth information across a range of baselines, therefore drastically increasing mapping accuracy. Next, predicted and fused depth uncertainty propagates not only into occupancy probabilities but also into alignment factors between generated dense submaps that enter the probabilistic nonlinear least squares estimator. This submap representation offers globally consistent geometry at scale. Our method is thoroughly evaluated in two benchmark datasets, resulting in localization and mapping accuracy that exceeds the state of the art, while simultaneously offering volumetric occupancy directly usable for downstream robotic planning and control in real-time.

Uncertainty-Aware Visual-Inertial SLAM with Volumetric Occupancy Mapping

TL;DR

This work tackles robust VI-SLAM with dense occupancy mapping by making depth fusion and map construction uncertainty-aware. It fuses depth predictions from stereo and MVS networks using per-pixel uncertainty estimates, then integrates these depths into both occupancy maps and the VI-SLAM factor graph via occupancy-to-point factors. The method introduces a tightly coupled, probabilistic framework where depth uncertainty propagates through depth fusion, occupancy updates, and estimator optimization, yielding globally consistent submaps and real-time dense occupancy. Experimental results on EuRoC and Hilti-Oxford show state-of-the-art localization and mapping accuracy, while delivering real-time volumetric occupancy suitable for planning and control.

Abstract

We propose visual-inertial simultaneous localization and mapping that tightly couples sparse reprojection errors, inertial measurement unit pre-integrals, and relative pose factors with dense volumetric occupancy mapping. Hereby depth predictions from a deep neural network are fused in a fully probabilistic manner. Specifically, our method is rigorously uncertainty-aware: first, we use depth and uncertainty predictions from a deep network not only from the robot's stereo rig, but we further probabilistically fuse motion stereo that provides depth information across a range of baselines, therefore drastically increasing mapping accuracy. Next, predicted and fused depth uncertainty propagates not only into occupancy probabilities but also into alignment factors between generated dense submaps that enter the probabilistic nonlinear least squares estimator. This submap representation offers globally consistent geometry at scale. Our method is thoroughly evaluated in two benchmark datasets, resulting in localization and mapping accuracy that exceeds the state of the art, while simultaneously offering volumetric occupancy directly usable for downstream robotic planning and control in real-time.
Paper Structure (16 sections, 14 equations, 4 figures, 5 tables)

This paper contains 16 sections, 14 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Our method estimates pose (black line) and volumetric occupancy represented in submaps (visualised here as colored meshes). We fuse stereo and MVS network depths based on their predicted uncertainties giving less noisy meshes, as can be qualitatively observed in the left mesh. Result from the Hilti-Oxford dataset hilti2022.
  • Figure 2: An overview block diagram of the proposed method. We fuse depths from stereo and MVS networks from which the current submap is expanded. Fused depth and stereo depth formulate map-to-map and frame-to-map factors in the visual-inertial estimator.
  • Figure 3: Predicted inverse (only for visualization purpose) depth and its corresponding standard deviation of (top) stereo network with the $11\,\text{cm}$ baseline, (middle) MVS network with the $50\,\text{cm}$ maximum baseline among $8\,$views, and (bottom) depth fusion in the EuRoC dataset.
  • Figure 4: Reconstructed mesh with mesh-to-point error encoding in V102 sequence. (Left) Ours and (Right) Simplemapping xin2023simplemapping.