Table of Contents
Fetching ...

4D Gaussian Splatting SLAM

Yanyan Li, Youxu Fang, Zunjie Zhu, Kunyi Li, Yong Ding, Federico Tombari

TL;DR

The study tackles simultaneous camera localization and 4D reconstruction of dynamic scenes by introducing a 4D Gaussian Splatting SLAM framework that separates scene content into static and dynamic Gaussian primitives. It initializes and tracks using motion masks, then performs a two-stage 4D mapping augmented by a novel Optical Flow Map Rendering signal to supervise dynamic motion via flow losses, sparse control-point deformation, and an MLP. The approach achieves robust pose estimation and high-fidelity dynamic scene reconstruction on real-world RGB-D sequences, outperforming several static GS-SLAM and NeRF-based dynamic SLAM baselines. By integrating motion-aware Gaussian representations with flow-based supervision, the method provides efficient online updates with strong geometric consistency, enabling improved view synthesis and potential robotics/AR deployment.

Abstract

Simultaneously localizing camera poses and constructing Gaussian radiance fields in dynamic scenes establish a crucial bridge between 2D images and the 4D real world. Instead of removing dynamic objects as distractors and reconstructing only static environments, this paper proposes an efficient architecture that incrementally tracks camera poses and establishes the 4D Gaussian radiance fields in unknown scenarios by using a sequence of RGB-D images. First, by generating motion masks, we obtain static and dynamic priors for each pixel. To eliminate the influence of static scenes and improve the efficiency on learning the motion of dynamic objects, we classify the Gaussian primitives into static and dynamic Gaussian sets, while the sparse control points along with an MLP is utilized to model the transformation fields of the dynamic Gaussians. To more accurately learn the motion of dynamic Gaussians, a novel 2D optical flow map reconstruction algorithm is designed to render optical flows of dynamic objects between neighbor images, which are further used to supervise the 4D Gaussian radiance fields along with traditional photometric and geometric constraints. In experiments, qualitative and quantitative evaluation results show that the proposed method achieves robust tracking and high-quality view synthesis performance in real-world environments.

4D Gaussian Splatting SLAM

TL;DR

The study tackles simultaneous camera localization and 4D reconstruction of dynamic scenes by introducing a 4D Gaussian Splatting SLAM framework that separates scene content into static and dynamic Gaussian primitives. It initializes and tracks using motion masks, then performs a two-stage 4D mapping augmented by a novel Optical Flow Map Rendering signal to supervise dynamic motion via flow losses, sparse control-point deformation, and an MLP. The approach achieves robust pose estimation and high-fidelity dynamic scene reconstruction on real-world RGB-D sequences, outperforming several static GS-SLAM and NeRF-based dynamic SLAM baselines. By integrating motion-aware Gaussian representations with flow-based supervision, the method provides efficient online updates with strong geometric consistency, enabling improved view synthesis and potential robotics/AR deployment.

Abstract

Simultaneously localizing camera poses and constructing Gaussian radiance fields in dynamic scenes establish a crucial bridge between 2D images and the 4D real world. Instead of removing dynamic objects as distractors and reconstructing only static environments, this paper proposes an efficient architecture that incrementally tracks camera poses and establishes the 4D Gaussian radiance fields in unknown scenarios by using a sequence of RGB-D images. First, by generating motion masks, we obtain static and dynamic priors for each pixel. To eliminate the influence of static scenes and improve the efficiency on learning the motion of dynamic objects, we classify the Gaussian primitives into static and dynamic Gaussian sets, while the sparse control points along with an MLP is utilized to model the transformation fields of the dynamic Gaussians. To more accurately learn the motion of dynamic Gaussians, a novel 2D optical flow map reconstruction algorithm is designed to render optical flows of dynamic objects between neighbor images, which are further used to supervise the 4D Gaussian radiance fields along with traditional photometric and geometric constraints. In experiments, qualitative and quantitative evaluation results show that the proposed method achieves robust tracking and high-quality view synthesis performance in real-world environments.

Paper Structure

This paper contains 23 sections, 10 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Example results from the proposed 4D-GS SLAM system. The top row showcases novel view synthesis and Gaussian visualizations in the BONN balloon (top left) and person_tracking (top right) sequences. The appearance and geometry of static and dynamic scenes are shown in the bottom row, respectively.
  • Figure 2: Architecture of the proposed Gaussian Splatting SLAM. The inputs to our system are temporally sequential RGB-D image sequences and motion masks.In the initial frame, dynamic and static Gaussians are independently initialized using a motion mask, and sparse control points are established according to the spatial distribution of dynamic Gaussians. The static structure is subsequently employed for camera pose estimation through photometric and geometric constraints. Following keyframe insertion, we co-optimize Gaussian attributes and camera poses while simultaneously estimating temporal motion patterns of dynamic Gaussians.
  • Figure 3: Visual comparison of the rendering images on the TUM RGB-D dataset.
  • Figure 4: The comparison of rendering results with different mapping strategies.
  • Figure 5: Visual comparison of the rendering image on the BONN RGB-D dataset. This is also supported by the quantitative results in Table \ref{['BONN render']}. More qualitative results have been added in Supplementary.
  • ...and 2 more figures