High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization

Shuo Sun; Malcolm Mielle; Achim J. Lilienthal; Martin Magnusson

High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization

Shuo Sun, Malcolm Mielle, Achim J. Lilienthal, Martin Magnusson

TL;DR

This paper tackles the challenge of online, high-fidelity dense RGBD SLAM by extending 3D Gaussian Splatting to simultaneous online mapping and tracking. It introduces rendering-guided Gaussian densification to fill holes and refine reobserved regions, and a regularized continual mapping objective to mitigate forgetting across frames. The method achieves state-of-the-art reconstruction on Replica and competitive results on TUM-RGBD, outperforming several neural implicit and Gaussian-based baselines in rendering fidelity while preserving realistic geometry. Limitations include sensitivity to motion blur in real-world data and the absence of loop-closure mechanisms, with future work aimed at loop closure, pose-graph optimization, real-time performance, and semantic integration.

Abstract

We propose a dense RGBD SLAM system based on 3D Gaussian Splatting that provides metrically accurate pose tracking and visually realistic reconstruction. To this end, we first propose a Gaussian densification strategy based on the rendering loss to map unobserved areas and refine reobserved areas. Second, we introduce extra regularization parameters to alleviate the forgetting problem in the continuous mapping problem, where parameters tend to overfit the latest frame and result in decreasing rendering quality for previous frames. Both mapping and tracking are performed with Gaussian parameters by minimizing re-rendering loss in a differentiable way. Compared to recent neural and concurrently developed gaussian splatting RGBD SLAM baselines, our method achieves state-of-the-art results on the synthetic dataset Replica and competitive results on the real-world dataset TUM.

High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization

TL;DR

Abstract

Paper Structure (15 sections, 11 equations, 5 figures, 5 tables)

This paper contains 15 sections, 11 equations, 5 figures, 5 tables.

Introduction
Related Work
Visual SLAM
Photo-realistic reconstruction
Preliminary: Gaussian Splatting Rendering
Methods
Mapping
Tracking
Experiments
Experiment Setup
Reconstruction Performance
Tracking Performance
Ablation Study
Runtime
Summary and Future Work

Figures (5)

Figure 1: Overview of the method. Our method takes RGBD frames as inputs. During mapping, when given a posed RGBD frame, we first render the opacity image, color image and depth image. Then we compare them with the ground truth to densify the existed map. During tracking, we minimize the color and depth re-rendering loss to optimize the camera pose.
Figure 2: Illustration of the forgetting problem in the context of continual mapping based on Gaussians. The Gaussians colored by yellow are shared by camera0 and camera1. However, these Gaussians tend to be optimized to overfit the latest frame camera1, resulting in drop of reconstruction quality for previous frames.
Figure 3: The rendering results on the $\texttt{Replica}$ dataset. The second and forth rows are zoomed-in details of the colored squares. Compared to Point-SLAMsandstrom2023point, our method generates sharper results; compared to SplaTAMkeetha2023splatam, our method doen not have floaters.
Figure 4: The difference of without/with Regularization. The first column shows the rendered result after just mapping $\texttt{frame0}$. The second column illustrates the rendered image of $\texttt{frame0}$ after processing 350 frames without regularization, while the third column showcases the same after applying regularization. The second rows show the zoomed-in results in the green square, we can see that with regularization, the result can still maintain good quality, especially for edges.
Figure 5: The ground truth image and the rendering image of $\texttt{TUM}$ dataset. Our method maps by accumulating all previous frames, leading to more visually pleasing image quality. However, the ground truth image is poor in this case due to motion blur and varying exposure, which brings difficulty to tracking. Also, it negatively affects the image similarity evaluation metric: PSNR is only 17.16 in this example.

High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization

TL;DR

Abstract

High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)