Table of Contents
Fetching ...

SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM

Siting Zhu, Renjie Qin, Guangming Wang, Jiuming Liu, Hesheng Wang

TL;DR

SemGauss-SLAM addresses the challenge of dense semantic SLAM by embedding semantic features into a 3D Gaussian splatting framework. It introduces a feature-level loss and semantic-informed bundle adjustment to enable robust multi-view optimization of both camera poses and the 3D semantic map. The method achieves superior mapping, tracking, and semantic reconstruction on Replica and ScanNet compared with radiance-field-based SLAM baselines, and delivers high-quality novel-view semantic segmentation. This work advances dense semantic understanding in unbounded 3D spaces with efficient differentiable rendering and multi-frame semantic constraints.

Abstract

We propose SemGauss-SLAM, a dense semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering simultaneously. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift in tracking and improve semantic reconstruction accuracy, we introduce semantic-informed bundle adjustment. By leveraging multi-frame semantic associations, this strategy enables joint optimization of 3D Gaussian representation and camera poses, resulting in low-drift tracking and accurate semantic mapping. Our SemGauss-SLAM demonstrates superior performance over existing radiance field-based SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in high-precision semantic segmentation and dense semantic mapping.

SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM

TL;DR

SemGauss-SLAM addresses the challenge of dense semantic SLAM by embedding semantic features into a 3D Gaussian splatting framework. It introduces a feature-level loss and semantic-informed bundle adjustment to enable robust multi-view optimization of both camera poses and the 3D semantic map. The method achieves superior mapping, tracking, and semantic reconstruction on Replica and ScanNet compared with radiance-field-based SLAM baselines, and delivers high-quality novel-view semantic segmentation. This work advances dense semantic understanding in unbounded 3D spaces with efficient differentiable rendering and multi-frame semantic constraints.

Abstract

We propose SemGauss-SLAM, a dense semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering simultaneously. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift in tracking and improve semantic reconstruction accuracy, we introduce semantic-informed bundle adjustment. By leveraging multi-frame semantic associations, this strategy enables joint optimization of 3D Gaussian representation and camera poses, resulting in low-drift tracking and accurate semantic mapping. Our SemGauss-SLAM demonstrates superior performance over existing radiance field-based SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in high-precision semantic segmentation and dense semantic mapping.
Paper Structure (11 sections, 11 equations, 5 figures, 6 tables)

This paper contains 11 sections, 11 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Our SemGauss-SLAM incorporates semantic feature embedding into 3D Gaussian representation to perform dense semantic SLAM. This modeling strategy not only achieves accurate semantic mapping, but also enables high-precision semantic novel view synthesis compared with other radiance field-based semantic SLAM. We visualize 3D Gaussian blobs with semantic embedding, showing the spatial layout of semantic Gaussian representation. Moreover, semantic mapping is visualized using semantic feature embedding, showing 3D semantic modeling of the scene.
  • Figure 2: An overview of SemGauss-SLAM. Our method takes an RGB-D stream as input. RGB images are fed into feature extractor to obtain semantic features. These features are then categorized by a pretrained classifier to attain semantic labels. Then, semantic features, semantic labels, along with the input RGB and depth data serve as supervision signals. In the meantime, semantic features and input RGB-D data propagate to 3D Gaussian blobs as initial properties of Gaussian representation. Rendered semantic features, RGB, and depth are obtained from 3D Gaussian splatting, while rendered semantic labels are attained by classifying rendered features. Supervision and rendered information are utilized for loss construction to optimize camera poses and 3D Gaussian representation. During the SLAM process, we utilize semantic-informed bundle adjustment based on multi-frame constraints for joint optimization of poses and 3D Gaussian representation.
  • Figure 3: Qualitative comparison on rendering quality of our method and baseline. We visualize 5 selected scenes of Replica straub2019replica and ScanNet dai2017scannet dataset. Details are highlighted with red color boxes. Our method achieves photo-realistic rendering quality and higher completion of reconstruction, especially in areas with rich textural information.
  • Figure 4: Qualitative comparison on semantic novel view synthesis on 3 scenes of Replica straub2019replica.
  • Figure 5: Semantic rendering results and ground truth labels of feature-level loss ablation on two scenes of Replica straub2019replica.