Table of Contents
Fetching ...

VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes

Tianchen Deng, Wenhua Wu, Junjie He, Yue Pan, Xirui Jiang, Shenghai Yuan, Danwei Wang, Hesheng Wang, Weidong Chen

TL;DR

VPGS-SLAM addresses memory and drift challenges in large-scale 3D Gaussian–based SLAM by introducing a voxel-based progressive representation organized into submaps. It integrates a 2D-3D fusion camera tracking pipeline with a BEV-informed loop-closure mechanism and online submap distillation to ensure global consistency across long sequences and diverse environments. The approach achieves scalable, real-time capable mapping and robust tracking in both indoor and outdoor scenes, outperforming prior GS-based methods on multiple benchmarks. An open-source implementation facilitates adoption and further development in large-scale SLAM applications.

Abstract

3D Gaussian Splatting has recently shown promising results in dense visual SLAM. However, existing 3DGS-based SLAM methods are all constrained to small-room scenarios and struggle with memory explosion in large-scale scenes and long sequences. To this end, we propose VPGS-SLAM, the first 3DGS-based large-scale RGBD SLAM framework for both indoor and outdoor scenarios. We design a novel voxel-based progressive 3D Gaussian mapping method with multiple submaps for compact and accurate scene representation in large-scale and long-sequence scenes. This allows us to scale up to arbitrary scenes and improves robustness (even under pose drifts). In addition, we propose a 2D-3D fusion camera tracking method to achieve robust and accurate camera tracking in both indoor and outdoor large-scale scenes. Furthermore, we design a 2D-3D Gaussian loop closure method to eliminate pose drift. We further propose a submap fusion method with online distillation to achieve global consistency in large-scale scenes when detecting a loop. Experiments on various indoor and outdoor datasets demonstrate the superiority and generalizability of the proposed framework. The code will be open source on https://github.com/dtc111111/vpgs-slam.

VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes

TL;DR

VPGS-SLAM addresses memory and drift challenges in large-scale 3D Gaussian–based SLAM by introducing a voxel-based progressive representation organized into submaps. It integrates a 2D-3D fusion camera tracking pipeline with a BEV-informed loop-closure mechanism and online submap distillation to ensure global consistency across long sequences and diverse environments. The approach achieves scalable, real-time capable mapping and robust tracking in both indoor and outdoor scenes, outperforming prior GS-based methods on multiple benchmarks. An open-source implementation facilitates adoption and further development in large-scale SLAM applications.

Abstract

3D Gaussian Splatting has recently shown promising results in dense visual SLAM. However, existing 3DGS-based SLAM methods are all constrained to small-room scenarios and struggle with memory explosion in large-scale scenes and long sequences. To this end, we propose VPGS-SLAM, the first 3DGS-based large-scale RGBD SLAM framework for both indoor and outdoor scenarios. We design a novel voxel-based progressive 3D Gaussian mapping method with multiple submaps for compact and accurate scene representation in large-scale and long-sequence scenes. This allows us to scale up to arbitrary scenes and improves robustness (even under pose drifts). In addition, we propose a 2D-3D fusion camera tracking method to achieve robust and accurate camera tracking in both indoor and outdoor large-scale scenes. Furthermore, we design a 2D-3D Gaussian loop closure method to eliminate pose drift. We further propose a submap fusion method with online distillation to achieve global consistency in large-scale scenes when detecting a loop. Experiments on various indoor and outdoor datasets demonstrate the superiority and generalizability of the proposed framework. The code will be open source on https://github.com/dtc111111/vpgs-slam.

Paper Structure

This paper contains 9 sections, 9 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: We present VPGS-SLAM, the first large-scale SLAM framework with voxel-based progressive 3D Gaussian representation, 2D-3D assisted camera tracking and 3D Gaussian loop closure. Depicted in the middle, we demonstrate the large-scale globally consistent 3D Gaussian map built with our approach. At the top and bottom of the figure, we include zoomed-in views of the map with RGB and depth images rendered by our method, indicated by dashed blue and yellow boxes.
  • Figure 2: System Overview. Our system is a large-scale SLAM framework, with voxel-based progressive 3D Gaussian representation, 2D-3D fusion camera tracking, and 3D Gaussian loop closure. Our framework takes color images and 3D point clouds as input. Our method can achieve accurate and efficient scene reconstruction, camera tracking, and global map generation.
  • Figure 3: Qualitative comparison between our proposed method and existing SOTA methods: SplaTAM splatam and Loop-Splat loopsplat. We demonstrate RGB image rendering results on the KITTI odometry dataset kitti. Our method shows improved rendering quality compared to these existing methods.