LangGS-SLAM: Real-Time Language-Feature Gaussian Splatting SLAM
Seongbo Ha, Sibaek Lee, Kyungsu Kang, Joonyeol Choi, Seungjun Tak, Hyeonwoo Yu
TL;DR
LangGS-SLAM addresses open-vocabulary language-driven 3D perception by online reconstruction of a language-aligned dense feature field within a SLAM framework. It introduces Top-K rendering for efficient semantic feature integration, a multi-criteria map management strategy for compact, consistent Gaussians, and a hybrid field optimization that decouples geometry and semantics under real-time constraints. The approach achieves superior geometric fidelity compared to geometry-only baselines and semantic fidelity comparable to offline dense methods while running at about 15 FPS, enabling open-set language reasoning directly over 3D scenes. This work thus bridges real-time 3D perception and language-based reasoning with practical performance suitable for interactive and open-vocabulary scenarios.
Abstract
In this paper, we propose a RGB-D SLAM system that reconstructs a language-aligned dense feature field while sustaining low-latency tracking and mapping. First, we introduce a Top-K Rendering pipeline, a high-throughput and semantic-distortion-free method for efficiently rendering high-dimensional feature maps. To address the resulting semantic-geometric discrepancy and mitigate the memory consumption, we further design a multi-criteria map management strategy that prunes redundant or inconsistent Gaussians while preserving scene integrity. Finally, a hybrid field optimization framework jointly refines the geometric and semantic fields under real-time constraints by decoupling their optimization frequencies according to field characteristics. The proposed system achieves superior geometric fidelity compared to geometric-only baselines and comparable semantic fidelity to offline approaches while operating at 15 FPS. Our results demonstrate that online SLAM with dense, uncompressed language-aligned feature fields is both feasible and effective, bridging the gap between 3D perception and language-based reasoning.
