Table of Contents
Fetching ...

Uni-SLAM: Uncertainty-Aware Neural Implicit SLAM for Real-Time Dense Indoor Scene Reconstruction

Shaoxiang Wang, Yaxu Xie, Chun-Peng Chang, Christen Millerdurai, Alain Pagani, Didier Stricker

TL;DR

Uni-SLAM tackles real-time dense indoor SLAM under varying RGB-D data quality by introducing a model-free predictive uncertainty that reweights pixel-level losses and guides local-to-global bundle adjustment. It employs a decoupled, hash-grid scene representation for geometry and appearance, enabling high-frequency detail while maintaining efficiency. On Replica, ScanNet, and TUM RGB-D, Uni-SLAM attains state-of-the-art tracking and mapping with significant depth L1 reductions and high completion percentages, while preserving real-time performance. The combination of predictive and image-level uncertainty with uncertainty-guided BA enhances robustness to outliers and data variability, making it practical for real-world robotics and AR applications.

Abstract

Neural implicit fields have recently emerged as a powerful representation method for multi-view surface reconstruction due to their simplicity and state-of-the-art performance. However, reconstructing thin structures of indoor scenes while ensuring real-time performance remains a challenge for dense visual SLAM systems. Previous methods do not consider varying quality of input RGB-D data and employ fixed-frequency mapping process to reconstruct the scene, which could result in the loss of valuable information in some frames. In this paper, we propose Uni-SLAM, a decoupled 3D spatial representation based on hash grids for indoor reconstruction. We introduce a novel defined predictive uncertainty to reweight the loss function, along with strategic local-to-global bundle adjustment. Experiments on synthetic and real-world datasets demonstrate that our system achieves state-of-the-art tracking and mapping accuracy while maintaining real-time performance. It significantly improves over current methods with a 25% reduction in depth L1 error and a 66.86% completion rate within 1 cm on the Replica dataset, reflecting a more accurate reconstruction of thin structures. Project page: https://shaoxiang777.github.io/project/uni-slam/

Uni-SLAM: Uncertainty-Aware Neural Implicit SLAM for Real-Time Dense Indoor Scene Reconstruction

TL;DR

Uni-SLAM tackles real-time dense indoor SLAM under varying RGB-D data quality by introducing a model-free predictive uncertainty that reweights pixel-level losses and guides local-to-global bundle adjustment. It employs a decoupled, hash-grid scene representation for geometry and appearance, enabling high-frequency detail while maintaining efficiency. On Replica, ScanNet, and TUM RGB-D, Uni-SLAM attains state-of-the-art tracking and mapping with significant depth L1 reductions and high completion percentages, while preserving real-time performance. The combination of predictive and image-level uncertainty with uncertainty-guided BA enhances robustness to outliers and data variability, making it practical for real-world robotics and AR applications.

Abstract

Neural implicit fields have recently emerged as a powerful representation method for multi-view surface reconstruction due to their simplicity and state-of-the-art performance. However, reconstructing thin structures of indoor scenes while ensuring real-time performance remains a challenge for dense visual SLAM systems. Previous methods do not consider varying quality of input RGB-D data and employ fixed-frequency mapping process to reconstruct the scene, which could result in the loss of valuable information in some frames. In this paper, we propose Uni-SLAM, a decoupled 3D spatial representation based on hash grids for indoor reconstruction. We introduce a novel defined predictive uncertainty to reweight the loss function, along with strategic local-to-global bundle adjustment. Experiments on synthetic and real-world datasets demonstrate that our system achieves state-of-the-art tracking and mapping accuracy while maintaining real-time performance. It significantly improves over current methods with a 25% reduction in depth L1 error and a 66.86% completion rate within 1 cm on the Replica dataset, reflecting a more accurate reconstruction of thin structures. Project page: https://shaoxiang777.github.io/project/uni-slam/

Paper Structure

This paper contains 25 sections, 36 equations, 29 figures, 15 tables.

Figures (29)

  • Figure 1: The reconstructed 3D mesh on the TUM RGB-D dataset tumrgbd, generated using our proposed method without uncertainty-guided reweighting and strategy, is illustrated in \ref{['fig:woUnc']}. Conversely, \ref{['fig:withUnc']} demonstrates the 3D mesh produced by our method after the incorporation of the uncertainty-aware strategy.
  • Figure 2: Uni-SLAM Architecture Overview. Our framework consists of two threads, tracking and mapping. While tracking is performed every frame for RGB-D stream, besides constant mapping is performed every $n$ frame constantly with global BA, activated additional mapping process is executed to capture local scene information based on uncertainty and co-visibility check with local BA and local loop closure optimization (LLCO). Our proposed pixel-level uncertainty method adaptively filters outlier pixels and reweights the loss function, enabling more precise localization during tracking and the reconstruction of color and geometric information in mapping.
  • Figure 3: Termination Probability and Uncertainty. This figure illustrates the termination probability and uncertainty during ray sampling. For pixel A with valid depth (sampling by Ray 1), the sampling density is high along this ray, leading to a high termination probability and lower uncertainty. In contrast, for pixel B with invalid depth (sampling by Ray 2), the sampling density is low along this ray, resulting in a lower termination probability and higher uncertainty, as seen in the uncertainty map (e). This leads to degraded rendering quality in regions with high uncertainty, as shown in (f). Back-projected points A and B correspond to the surfaces of the hit objects in 3D space. For point B with invalid depth, we can estimate an approximate depth value based on the model in its current state.
  • Figure 4: Strategic BA. While the tracking process is performed at every frame, we perform a constant mapping with global bundle adjustment (GBA) at a fixed frequency. Thus, the pose and map are optimized using all keyframes from the start to the end of the frame sequence. If an outlier frame is detected based on its uncertainty, a local bundle adjustment (LBA) is performed, as shown in red. If a loop closure is detected, a local loop closure optimization (LLCO) is performed, as shown in green in the figure.
  • Figure 5: Activated additional local BA. From position $P_i$ to $P_{i+1}$, sudden large movements lead to difficulties in pose estimation and increased uncertainty due to unseen areas. The initialization of $Init \ P_{i+1}$ based on the constant speed assumption is hard to optimize. Therefore, besides constant global BA, we activate additional local BA based on image-level uncertainty to optimize local information. This simulates slowing down the movement. Its effectiveness can be found in \ref{['fig:tumrgbd_mesh']} and \ref{['tab:impact_BA']}.
  • ...and 24 more figures