OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding
Dianyi Yang, Yu Gao, Xihan Wang, Yufeng Yue, Yi Yang, Mengyin Fu
TL;DR
OpenGS-SLAM tackles open-set dense semantic SLAM by attaching explicit semantic labels to each Gaussian in a 3D Gaussian Splatting representation, enabling online 3D object-level scene understanding. It introduces Gaussian Voting Splatting for fast 2D label rendering, Confidence-based 2D Label Consensus to stabilize cross-view labeling, and Segmentation Counter Pruning to refine segmentation, all powered by an Ensemble Semantic Information Generator that leverages 2D foundation vision systems without extra training. The combination yields notable gains in semantic rendering speed and storage efficiency, while achieving strong tracking and reconstruction performance on Replica and TUM datasets, demonstrating practical impact for open-world robotic mapping and interaction. However, the approach focuses on static scenes; extending to dynamic environments and expanding open-set data remains future work.
Abstract
Recent advancements in 3D Gaussian Splatting have significantly improved the efficiency and quality of dense semantic SLAM. However, previous methods are generally constrained by limited-category pre-trained classifiers and implicit semantic representation, which hinder their performance in open-set scenarios and restrict 3D object-level scene understanding. To address these issues, we propose OpenGS-SLAM, an innovative framework that utilizes 3D Gaussian representation to perform dense semantic SLAM in open-set environments. Our system integrates explicit semantic labels derived from 2D foundational models into the 3D Gaussian framework, facilitating robust 3D object-level scene understanding. We introduce Gaussian Voting Splatting to enable fast 2D label map rendering and scene updating. Additionally, we propose a Confidence-based 2D Label Consensus method to ensure consistent labeling across multiple views. Furthermore, we employ a Segmentation Counter Pruning strategy to improve the accuracy of semantic scene representation. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of our method in scene understanding, tracking, and mapping, achieving 10 times faster semantic rendering and 2 times lower storage costs compared to existing methods. Project page: https://young-bit.github.io/opengs-github.github.io/.
