Table of Contents
Fetching ...

Compact 3D Gaussian Splatting For Dense Visual SLAM

Tianchen Deng, Yaohui Chen, Leyan Zhang, Jianfei Yang, Shenghai Yuan, Jiuming Liu, Danwei Wang, Hesheng Wang, Weidong Chen

TL;DR

This work tackles the memory and training-speed bottlenecks of 3D Gaussian Splatting in dense SLAM by introducing a compact representation. It integrates a sliding-window online mask to prune redundant Gaussians, a residual-vector geometry codebook to compress scale/rotation (and color/opacity), and a global bundle adjustment with reprojection loss to boost pose accuracy. Across Replica, ScanNet, and TUM-RGBD, the approach achieves faster training and rendering while preserving state-of-the-art scene reconstruction, with significant memory reductions and the ability to plug into existing GS-based SLAM systems. The results demonstrate practical viability for real-time dense SLAM on resource-constrained devices and open pathways for broader adoption of Gaussian-based representations.

Abstract

Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, i.e., the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation.

Compact 3D Gaussian Splatting For Dense Visual SLAM

TL;DR

This work tackles the memory and training-speed bottlenecks of 3D Gaussian Splatting in dense SLAM by introducing a compact representation. It integrates a sliding-window online mask to prune redundant Gaussians, a residual-vector geometry codebook to compress scale/rotation (and color/opacity), and a global bundle adjustment with reprojection loss to boost pose accuracy. Across Replica, ScanNet, and TUM-RGBD, the approach achieves faster training and rendering while preserving state-of-the-art scene reconstruction, with significant memory reductions and the ability to plug into existing GS-based SLAM systems. The results demonstrate practical viability for real-time dense SLAM on resource-constrained devices and open pathways for broader adoption of Gaussian-based representations.

Abstract

Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, i.e., the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation.
Paper Structure (11 sections, 17 equations, 10 figures, 8 tables)

This paper contains 11 sections, 17 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Our framework minimizes storage and accelerates rendering while maintaining the SOTA image reconstruction performance. The proposed framework eliminates unnecessary 3D Gaussian ellipsoids without affecting performance. We highlight and enlarge some areas to show the significant reduction of 3D Gaussian points.
  • Figure 2: The pipeline of our GS-based SLAM system. The input of our system is RGB-D images. We start the SLAM system by initializing the 3D Gaussian map construct. Then, we update our 3D Gaussian map by adding new Gaussians and using the learnable mask to reduce the redundant 3D Gaussian ellipsoids. We incorporate a codebook-based vector quantization method to compress the scene representation. For camera tracking, we maintain a global keyframe database for local-to-global BA and use reprojection loss for robust pose estimation.
  • Figure 3: The core distinctions between GS-based SLAM systems and original 3DGS: the data processing and keyframe selection.
  • Figure 4: The KL divergence distribution of the Gaussian ellipsoids with the online training of the SLAM system on different time steps (500, 1000, 1500, 2000). A larger area of the blue region signifies lower similarity between the 3D Gaussian ellipsoids, whereas a smaller area and a higher peak indicate greater similarity between the Gaussian ellipsoids. We can observe that the similarity in geometry consistently remains at a high level of GS-based SLAM system.
  • Figure 5: The left figure shows the learnable mask strategy. We perform frustum selection and sliding widow reset to remove redundant Gaussian ellipsoids while maintaining the reconstruction accuracy efficiently. The dashed lines represent the removed 3D Gaussian ellipsoids. The right figure shows the varying count of Gaussian ellipsoids during the SLAM system operation. These two curves show the distinction between our system with and without masks. Our mask strategy achieves 1.97 $\times$ compression on the number of 3D Gaussians.
  • ...and 5 more figures