STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM
Yongxu Wang, Xu Cao, Weiyun Yi, Zhaoxin Fan
TL;DR
STAMICS tackles semantic drift in dense RGB-D SLAM by fusing semantic information with 3D Gaussian splatting. It introduces semantic-enhanced Gaussian representations, a temporal semantic consistency pipeline, and open-vocabulary expansion to label unseen objects, all optimized via differentiable rendering. The framework yields improved pose accuracy and map fidelity across multiple benchmarks, outperforming or matching state-of-the-art methods while handling dynamic and diverse environments. The approach advances dense SLAM by providing coherent semantics over time and flexible vocabulary, with practical implications for robust autonomous perception.
Abstract
Simultaneous Localization and Mapping (SLAM) is a critical task in robotics, enabling systems to autonomously navigate and understand complex environments. Current SLAM approaches predominantly rely on geometric cues for mapping and localization, but they often fail to ensure semantic consistency, particularly in dynamic or densely populated scenes. To address this limitation, we introduce STAMICS, a novel method that integrates semantic information with 3D Gaussian representations to enhance both localization and mapping accuracy. STAMICS consists of three key components: a 3D Gaussian-based scene representation for high-fidelity reconstruction, a graph-based clustering technique that enforces temporal semantic consistency, and an open-vocabulary system that allows for the classification of unseen objects. Extensive experiments show that STAMICS significantly improves camera pose estimation and map quality, outperforming state-of-the-art methods while reducing reconstruction errors. Code will be public available.
