Table of Contents
Fetching ...

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, Hongyu Wang

Abstract

We present SGS-SLAM, the first semantic visual SLAM system based on Gaussian Splatting. It incorporates appearance, geometry, and semantic features through multi-channel optimization, addressing the oversmoothing limitations of neural implicit SLAM systems in high-quality rendering, scene understanding, and object-level geometry. We introduce a unique semantic feature loss that effectively compensates for the shortcomings of traditional depth and color losses in object optimization. Through a semantic-guided keyframe selection strategy, we prevent erroneous reconstructions caused by cumulative errors. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, precise semantic segmentation, and object-level geometric accuracy, while ensuring real-time rendering capabilities.

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Abstract

We present SGS-SLAM, the first semantic visual SLAM system based on Gaussian Splatting. It incorporates appearance, geometry, and semantic features through multi-channel optimization, addressing the oversmoothing limitations of neural implicit SLAM systems in high-quality rendering, scene understanding, and object-level geometry. We introduce a unique semantic feature loss that effectively compensates for the shortcomings of traditional depth and color losses in object optimization. Through a semantic-guided keyframe selection strategy, we prevent erroneous reconstructions caused by cumulative errors. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, precise semantic segmentation, and object-level geometric accuracy, while ensuring real-time rendering capabilities.
Paper Structure (34 sections, 10 equations, 10 figures, 8 tables)

This paper contains 34 sections, 10 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: The illustration of the proposed SGS-SLAM. It employs 2D inputs encompassing appearance, geometry, and semantic information, leveraging Gaussian Splatting and differentiable rendering for multi-channel parameter optimization. During the mapping process, SGS-SLAM maps the 2D semantic prior to the 3D scene, jointly optimizing it via the mapping loss for accurate 3D segmentation outcomes.
  • Figure 2: Qualitative comparison of our method and the baselines for reconstruction across three scenes from the Replica Dataset straub2019replica, with key details accentuated using colored boxes. The results demonstrate that our method delivers more high-fidelity and robust reconstructions.
  • Figure 3: The selected novel view synthesis of scene0000 from the ScanNet dataset dai2017scannet. The rendered views display the reconstructed objects such as bike, fridge, garbage bin, and guitar from novel views. Our method outperforms baselines by a large margin primarily due to the integration of keyframe optimization and semantic constraints. Note that the ground-truth for novel views is captured from the offline-reconstructed mesh provided by the ScanNet dataset.
  • Figure 4: The case study on scene manipulation in room0 of the Replica dataset straub2019replica. We show the capabilities for object removal and transformation by specifying semantic labels. SGS-SLAM allows manipulation of either individual objects or a group of items, as illustrated by actions that include the removal of a jar and flowers, as well as moving and rotating them.
  • Figure 5: Qualitative comparison of our method and DNS-SLAM li2023dns for semantic segmentation from the Replica dataset straub2019replica. The visualization outcomes of DNS-SLAM li2023dns are obtained from its paper. The frames of the training view are chosen based on the results presented in DNS-SLAM. Compared to NeRF-based models, our approach delivers segmentation results with higher accuracy.
  • ...and 5 more figures