Table of Contents
Fetching ...

FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting

Phu Pham, Damon Conover, Aniket Bera

TL;DR

FlashSLAM tackles real-time RGB-D SLAM by marrying 3D Gaussian Splatting with a fast vision-based camera-tracking pipeline. By leveraging pretrained feature matching (LightGlue + SuperPoint) and point-cloud registration, it achieves under 80 ms tracking and robust performance in sparse-view and large-motion scenarios, while mitigating consumer-depth noise with a depth-truncation strategy. The mapping framework dynamically adds Gaussians and uses ICP-based alignment with a photometric/depth loss $L = \lambda L_{color} + (1-\lambda) L_{depth}$ to produce high-fidelity reconstructions, complemented by keyframe selection and color refinement via priority sampling. Extensive experiments on Replica, TUM-RGBD, ScanNet, and self-captured data show state-of-the-art accuracy and efficiency, including high-quality novel view synthesis, making FlashSLAM a practical, high-performance SLAM solution for consumer devices.

Abstract

We present FlashSLAM, a novel SLAM approach that leverages 3D Gaussian Splatting for efficient and robust 3D scene reconstruction. Existing 3DGS-based SLAM methods often fall short in sparse view settings and during large camera movements due to their reliance on gradient descent-based optimization, which is both slow and inaccurate. FlashSLAM addresses these limitations by combining 3DGS with a fast vision-based camera tracking technique, utilizing a pretrained feature matching model and point cloud registration for precise pose estimation in under 80 ms - a 90% reduction in tracking time compared to SplaTAM - without costly iterative rendering. In sparse settings, our method achieves up to a 92% improvement in average tracking accuracy over previous methods. Additionally, it accounts for noise in depth sensors, enhancing robustness when using unspecialized devices such as smartphones. Extensive experiments show that FlashSLAM performs reliably across both sparse and dense settings, in synthetic and real-world environments. Evaluations on benchmark datasets highlight its superior accuracy and efficiency, establishing FlashSLAM as a versatile and high-performance solution for SLAM, advancing the state-of-the-art in 3D reconstruction across diverse applications.

FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting

TL;DR

FlashSLAM tackles real-time RGB-D SLAM by marrying 3D Gaussian Splatting with a fast vision-based camera-tracking pipeline. By leveraging pretrained feature matching (LightGlue + SuperPoint) and point-cloud registration, it achieves under 80 ms tracking and robust performance in sparse-view and large-motion scenarios, while mitigating consumer-depth noise with a depth-truncation strategy. The mapping framework dynamically adds Gaussians and uses ICP-based alignment with a photometric/depth loss to produce high-fidelity reconstructions, complemented by keyframe selection and color refinement via priority sampling. Extensive experiments on Replica, TUM-RGBD, ScanNet, and self-captured data show state-of-the-art accuracy and efficiency, including high-quality novel view synthesis, making FlashSLAM a practical, high-performance SLAM solution for consumer devices.

Abstract

We present FlashSLAM, a novel SLAM approach that leverages 3D Gaussian Splatting for efficient and robust 3D scene reconstruction. Existing 3DGS-based SLAM methods often fall short in sparse view settings and during large camera movements due to their reliance on gradient descent-based optimization, which is both slow and inaccurate. FlashSLAM addresses these limitations by combining 3DGS with a fast vision-based camera tracking technique, utilizing a pretrained feature matching model and point cloud registration for precise pose estimation in under 80 ms - a 90% reduction in tracking time compared to SplaTAM - without costly iterative rendering. In sparse settings, our method achieves up to a 92% improvement in average tracking accuracy over previous methods. Additionally, it accounts for noise in depth sensors, enhancing robustness when using unspecialized devices such as smartphones. Extensive experiments show that FlashSLAM performs reliably across both sparse and dense settings, in synthetic and real-world environments. Evaluations on benchmark datasets highlight its superior accuracy and efficiency, establishing FlashSLAM as a versatile and high-performance solution for SLAM, advancing the state-of-the-art in 3D reconstruction across diverse applications.

Paper Structure

This paper contains 34 sections, 11 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Overview of FlashSLAM: Our approach takes RGB-D inputs to perform accurate 3D scene reconstruction. Initially, precise matches between consecutive frames are detected, which enables tracking of the camera pose through a rigid transformation. This pose is further refined using gradient-based optimization, leveraging Gaussian alignment to ensure accurate registration of new frames with the existing 3D model. The mapping process updates and transforms existing Gaussian splats in the 3D scene, producing high-quality reconstructions with efficient alignment and optimization steps.
  • Figure 2: Rendering comparison on Replica dataset.
  • Figure 3: Novel view synthesis results for two scenes from the ScanNet++ dataset scannet++. The first three columns show the results for scene ID 8b5caf3398, and the last three columns correspond to scene ID b20a261fdf.
  • Figure 4: Novel view synthesis results with depth for scene b20a261fdf from the ScanNet++ dataset scannet++. The left columns display RGB images, and the right columns show the corresponding depth maps.
  • Figure 5: Example of a failed reconstruction by SplaTAM splatam on our custom dataset, showing severe distortion and misaligned geometry due to tracking failures.
  • ...and 4 more figures