Table of Contents
Fetching ...

GARAD-SLAM: 3D GAussian splatting for Real-time Anti Dynamic SLAM

Mingrui Li, Weijian Chen, Na Cheng, Jingyuan Xu, Dong Li, Hongyu Wang

TL;DR

GARAD-SLAM addresses dynamic interference in 3D Gaussian Splatting SLAM by combining back-end CRF-based segmentation over Gaussians with a Gaussian pyramid network that maps labels to front-end features, and by enforcing a dynamic rendering penalty during optimization. It directly labels Gaussians as static or dynamic by minimizing the Gibbs energy $E(X)=\sum_i \psi_u(x_i)+\sum_{i<j} \psi_p(x_i,x_j)$ with unary potentials derived from a Gaussian Mixture Model, and it uses a rendering loss $\mathcal{L}=\lambda_{p-ssim}\mathcal{L}_{p-ssim}+\lambda_{dyn}\mathcal{L}_{dyn}$ where $\mathcal{L}_{dyn}=\sum_{i\in \mathcal{G}_D}{\alpha_i}^2$. The approach tightly couples tracking and mapping, reduces dynamic artifacts, and yields state-of-the-art rendering quality while maintaining competitive tracking on real-world datasets. It runs in real time on commodity GPUs, enabling robust dense reconstruction in dynamic environments for AR/VR and robotics.

Abstract

The 3D Gaussian Splatting (3DGS)-based SLAM system has garnered widespread attention due to its excellent performance in real-time high-fidelity rendering. However, in real-world environments with dynamic objects, existing 3DGS-based SLAM systems often face mapping errors and tracking drift issues. To address these problems, we propose GARAD-SLAM, a real-time 3DGS-based SLAM system tailored for dynamic scenes. In terms of tracking, unlike traditional methods, we directly perform dynamic segmentation on Gaussians and map them back to the front-end to obtain dynamic point labels through a Gaussian pyramid network, achieving precise dynamic removal and robust tracking. For mapping, we impose rendering penalties on dynamically labeled Gaussians, which are updated through the network, to avoid irreversible erroneous removal caused by simple pruning. Our results on real-world datasets demonstrate that our method is competitive in tracking compared to baseline methods, generating fewer artifacts and higher-quality reconstructions in rendering.

GARAD-SLAM: 3D GAussian splatting for Real-time Anti Dynamic SLAM

TL;DR

GARAD-SLAM addresses dynamic interference in 3D Gaussian Splatting SLAM by combining back-end CRF-based segmentation over Gaussians with a Gaussian pyramid network that maps labels to front-end features, and by enforcing a dynamic rendering penalty during optimization. It directly labels Gaussians as static or dynamic by minimizing the Gibbs energy with unary potentials derived from a Gaussian Mixture Model, and it uses a rendering loss where . The approach tightly couples tracking and mapping, reduces dynamic artifacts, and yields state-of-the-art rendering quality while maintaining competitive tracking on real-world datasets. It runs in real time on commodity GPUs, enabling robust dense reconstruction in dynamic environments for AR/VR and robotics.

Abstract

The 3D Gaussian Splatting (3DGS)-based SLAM system has garnered widespread attention due to its excellent performance in real-time high-fidelity rendering. However, in real-world environments with dynamic objects, existing 3DGS-based SLAM systems often face mapping errors and tracking drift issues. To address these problems, we propose GARAD-SLAM, a real-time 3DGS-based SLAM system tailored for dynamic scenes. In terms of tracking, unlike traditional methods, we directly perform dynamic segmentation on Gaussians and map them back to the front-end to obtain dynamic point labels through a Gaussian pyramid network, achieving precise dynamic removal and robust tracking. For mapping, we impose rendering penalties on dynamically labeled Gaussians, which are updated through the network, to avoid irreversible erroneous removal caused by simple pruning. Our results on real-world datasets demonstrate that our method is competitive in tracking compared to baseline methods, generating fewer artifacts and higher-quality reconstructions in rendering.

Paper Structure

This paper contains 13 sections, 14 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Rendering result in BONN datasets. The reconstruction result is generated by our GARAD-SLAM.
  • Figure 2: Frame of GARAD-SLAM. Given a series of RGB-D frames, we simultaneously construct the gaussian map and camera pose via Gaussian pyramid with Photometric-SSIM Loss $\lambda _{p-ssim}$ and dyn Loss $\lambda _{dyn}$.
  • Figure 3: ATE for ORB-SLAM3 and GARAD-SLAM on some sequences of TUM and BONN datasets.
  • Figure 4: Gaussian map of TUM RGB-D and BONN datasets.
  • Figure 5: Comparison of rendering quality.