Table of Contents
Fetching ...

ADD-SLAM: Adaptive Dynamic Dense SLAM with Gaussian Splatting

Wenhua Wu, Chenpeng Su, Siting Zhu, Tianchen Deng, Zhe Liu, Hesheng Wang

TL;DR

ADD-SLAM tackles dynamic scene challenges in dense SLAM by exploiting scene-consistency between observations and a historical Gaussian map to identify dynamics without semantic priors. It represents the scene with 3D Gaussian splats, maintaining a static map G_s and per-object dynamic maps G_d^{id}(t), and uses rendering-driven inconsistencies to drive adaptive dynamic segmentation and 2D object tracking. A dynamic-static composite mapping pipeline removes dynamic regions from the static map while online densifying static regions and constructing a per-object temporal Gaussian model for dynamics, all underpinned by a camera-tracking objective and loop-closure with DBA. Across Bonn, DAVIS, and TUM RGB-D, ADD-SLAM achieves state-of-the-art localization and rich dynamic object modeling, confirming its practical value for robotic perception in dynamic environments.

Abstract

Recent advancements in Neural Radiance Fields (NeRF) and 3D Gaussian-based Simultaneous Localization and Mapping (SLAM) methods have demonstrated exceptional localization precision and remarkable dense mapping performance. However, dynamic objects introduce critical challenges by disrupting scene consistency, leading to tracking drift and mapping artifacts. Existing methods that employ semantic segmentation or object detection for dynamic identification and filtering typically rely on predefined categorical priors, while discarding dynamic scene information crucial for robotic applications such as dynamic obstacle avoidance and environmental interaction. To overcome these challenges, we propose ADD-SLAM: an Adaptive Dynamic Dense SLAM framework based on Gaussian splitting. We design an adaptive dynamic identification mechanism grounded in scene consistency analysis, comparing geometric and textural discrepancies between real-time observations and historical maps. Ours requires no predefined semantic category priors and adaptively discovers scene dynamics. Precise dynamic object recognition effectively mitigates interference from moving targets during localization. Furthermore, we propose a dynamic-static separation mapping strategy that constructs a temporal Gaussian model to achieve online incremental dynamic modeling. Experiments conducted on multiple dynamic datasets demonstrate our method's flexible and accurate dynamic segmentation capabilities, along with state-of-the-art performance in both localization and mapping.

ADD-SLAM: Adaptive Dynamic Dense SLAM with Gaussian Splatting

TL;DR

ADD-SLAM tackles dynamic scene challenges in dense SLAM by exploiting scene-consistency between observations and a historical Gaussian map to identify dynamics without semantic priors. It represents the scene with 3D Gaussian splats, maintaining a static map G_s and per-object dynamic maps G_d^{id}(t), and uses rendering-driven inconsistencies to drive adaptive dynamic segmentation and 2D object tracking. A dynamic-static composite mapping pipeline removes dynamic regions from the static map while online densifying static regions and constructing a per-object temporal Gaussian model for dynamics, all underpinned by a camera-tracking objective and loop-closure with DBA. Across Bonn, DAVIS, and TUM RGB-D, ADD-SLAM achieves state-of-the-art localization and rich dynamic object modeling, confirming its practical value for robotic perception in dynamic environments.

Abstract

Recent advancements in Neural Radiance Fields (NeRF) and 3D Gaussian-based Simultaneous Localization and Mapping (SLAM) methods have demonstrated exceptional localization precision and remarkable dense mapping performance. However, dynamic objects introduce critical challenges by disrupting scene consistency, leading to tracking drift and mapping artifacts. Existing methods that employ semantic segmentation or object detection for dynamic identification and filtering typically rely on predefined categorical priors, while discarding dynamic scene information crucial for robotic applications such as dynamic obstacle avoidance and environmental interaction. To overcome these challenges, we propose ADD-SLAM: an Adaptive Dynamic Dense SLAM framework based on Gaussian splitting. We design an adaptive dynamic identification mechanism grounded in scene consistency analysis, comparing geometric and textural discrepancies between real-time observations and historical maps. Ours requires no predefined semantic category priors and adaptively discovers scene dynamics. Precise dynamic object recognition effectively mitigates interference from moving targets during localization. Furthermore, we propose a dynamic-static separation mapping strategy that constructs a temporal Gaussian model to achieve online incremental dynamic modeling. Experiments conducted on multiple dynamic datasets demonstrate our method's flexible and accurate dynamic segmentation capabilities, along with state-of-the-art performance in both localization and mapping.

Paper Structure

This paper contains 14 sections, 22 equations, 23 figures, 5 tables.

Figures (23)

  • Figure 1: ADD-SLAM. Given RGB-D stream, our method achieves precise camera pose tracking while constructing dynamic-static composition maps. Our method can adaptively segment dynamic objects of any category without any semantic priors. The illustration presents effective dynamic tracking and mapping results, and high-quality rendering results of the dynamic-static separation.
  • Figure 2: Overview of ADD-SLAM. The input RGB-D stream is first used to initialize the static map with the first frame. Dynamic objects in the environment are then adaptively segmented based on consistency analysis, and a dynamic tracking sequence is constructed. Building upon this, dynamic-static separation is performed on the original static map. Camera pose optimization is carried out using tracking loss, followed by dynamic-static map optimization.
  • Figure 3: Rendering Visualization. Compared to other methods, ADD-SLAM not only accurately reconstructs the static background but also captures fine details of the dynamic foreground. Our dynamic mask is more complete and precise than the uncertainty of WildGS-SLAM zheng2025wildgs.
  • Figure 4: Comparison between our dynamic mask and the uncertainty of WildGS-SLAM zheng2025wildgs on the DAVIS dataset.
  • Figure 5: Comparison of the dynamic segmentation results of our method with those obtained using a semantic segmentation network.
  • ...and 18 more figures