D$^2$GSLAM: 4D Dynamic Gaussian Splatting SLAM
Siting Zhu, Yuxiang Huang, Wenhua Wu, Chaokang Jiang, Yongbo Chen, I-Ming Chen, Hesheng Wang
TL;DR
D$^2$GSLAM introduces a Gaussian-based dynamic SLAM framework that jointly reconstructs static and dynamic scene parts and tracks the camera in dynamic environments. Key ideas include a geometric-prompt dynamic separation to generate robust motion masks, a dynamic-static composite map combining 3D static Gaussians with 4D dynamic Gaussians, retrospective frame optimization to maintain temporal coherence, and a motion-consistency loss to leverage temporal dynamics. The system employs a progressive tracking strategy and comprehensive losses to achieve accurate dynamic modeling, outperforming state-of-the-art baselines on Bonn, TUM, and static datasets in both tracking and reconstruction metrics. Although not real-time for dynamic modeling, the method demonstrates substantial improvements in dynamic scene understanding with practical runtime for motion segmentation and robust tracking in real-world indoor environments.
Abstract
Recent advances in Dense Simultaneous Localization and Mapping (SLAM) have demonstrated remarkable performance in static environments. However, dense SLAM in dynamic environments remains challenging. Most methods directly remove dynamic objects and focus solely on static scene reconstruction, which ignores the motion information contained in these dynamic objects. In this paper, we present D$^2$GSLAM, a novel dynamic SLAM system utilizing Gaussian representation, which simultaneously performs accurate dynamic reconstruction and robust tracking within dynamic environments. Our system is composed of four key components: (i) We propose a geometric-prompt dynamic separation method to distinguish between static and dynamic elements of the scene. This approach leverages the geometric consistency of Gaussian representation and scene geometry to obtain coarse dynamic regions. The regions then serve as prompts to guide the refinement of the coarse mask for achieving accurate motion mask. (ii) To facilitate accurate and efficient mapping of the dynamic scene, we introduce dynamic-static composite representation that integrates static 3D Gaussians with dynamic 4D Gaussians. This representation allows for modeling the transitions between static and dynamic states of objects in the scene for composite mapping and optimization. (iii) We employ a progressive pose refinement strategy that leverages both the multi-view consistency of static scene geometry and motion information from dynamic objects to achieve accurate camera tracking. (iv) We introduce a motion consistency loss, which leverages the temporal continuity in object motions for accurate dynamic modeling. Our D$^2$GSLAM demonstrates superior performance on dynamic scenes in terms of mapping and tracking accuracy, while also showing capability in accurate dynamic modeling.
