SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Mingrui Li; Shuhong Liu; Heng Zhou; Guohao Zhu; Na Cheng; Tianchen Deng; Hongyu Wang

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, Hongyu Wang

Abstract

We present SGS-SLAM, the first semantic visual SLAM system based on Gaussian Splatting. It incorporates appearance, geometry, and semantic features through multi-channel optimization, addressing the oversmoothing limitations of neural implicit SLAM systems in high-quality rendering, scene understanding, and object-level geometry. We introduce a unique semantic feature loss that effectively compensates for the shortcomings of traditional depth and color losses in object optimization. Through a semantic-guided keyframe selection strategy, we prevent erroneous reconstructions caused by cumulative errors. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, precise semantic segmentation, and object-level geometric accuracy, while ensuring real-time rendering capabilities.

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Abstract

Paper Structure (34 sections, 10 equations, 10 figures, 8 tables)

This paper contains 34 sections, 10 equations, 10 figures, 8 tables.

Introduction
Related Work
Semantic SLAM
Neural Implicit SLAM
3D Gaussian Splatting SLAM
Method
Multi-Channel Gaussian Representation
Tracking and Mapping
Camera Pose Estimation
Keyframes Selection and Weighting
Map Reconstruction
Scene Manipulation via Object-level Geometry
Experiment
Experimental Setup
Datasets
...and 19 more sections

Figures (10)

Figure 1: The illustration of the proposed SGS-SLAM. It employs 2D inputs encompassing appearance, geometry, and semantic information, leveraging Gaussian Splatting and differentiable rendering for multi-channel parameter optimization. During the mapping process, SGS-SLAM maps the 2D semantic prior to the 3D scene, jointly optimizing it via the mapping loss for accurate 3D segmentation outcomes.
Figure 2: Qualitative comparison of our method and the baselines for reconstruction across three scenes from the Replica Dataset straub2019replica, with key details accentuated using colored boxes. The results demonstrate that our method delivers more high-fidelity and robust reconstructions.
Figure 3: The selected novel view synthesis of scene0000 from the ScanNet dataset dai2017scannet. The rendered views display the reconstructed objects such as bike, fridge, garbage bin, and guitar from novel views. Our method outperforms baselines by a large margin primarily due to the integration of keyframe optimization and semantic constraints. Note that the ground-truth for novel views is captured from the offline-reconstructed mesh provided by the ScanNet dataset.
Figure 4: The case study on scene manipulation in room0 of the Replica dataset straub2019replica. We show the capabilities for object removal and transformation by specifying semantic labels. SGS-SLAM allows manipulation of either individual objects or a group of items, as illustrated by actions that include the removal of a jar and flowers, as well as moving and rotating them.
Figure 5: Qualitative comparison of our method and DNS-SLAM li2023dns for semantic segmentation from the Replica dataset straub2019replica. The visualization outcomes of DNS-SLAM li2023dns are obtained from its paper. The frames of the training view are chosen based on the results presented in DNS-SLAM. Compared to NeRF-based models, our approach delivers segmentation results with higher accuracy.
...and 5 more figures

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Abstract

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Authors

Abstract

Table of Contents

Figures (10)