4D Gaussian Splatting SLAM
Yanyan Li, Youxu Fang, Zunjie Zhu, Kunyi Li, Yong Ding, Federico Tombari
TL;DR
The study tackles simultaneous camera localization and 4D reconstruction of dynamic scenes by introducing a 4D Gaussian Splatting SLAM framework that separates scene content into static and dynamic Gaussian primitives. It initializes and tracks using motion masks, then performs a two-stage 4D mapping augmented by a novel Optical Flow Map Rendering signal to supervise dynamic motion via flow losses, sparse control-point deformation, and an MLP. The approach achieves robust pose estimation and high-fidelity dynamic scene reconstruction on real-world RGB-D sequences, outperforming several static GS-SLAM and NeRF-based dynamic SLAM baselines. By integrating motion-aware Gaussian representations with flow-based supervision, the method provides efficient online updates with strong geometric consistency, enabling improved view synthesis and potential robotics/AR deployment.
Abstract
Simultaneously localizing camera poses and constructing Gaussian radiance fields in dynamic scenes establish a crucial bridge between 2D images and the 4D real world. Instead of removing dynamic objects as distractors and reconstructing only static environments, this paper proposes an efficient architecture that incrementally tracks camera poses and establishes the 4D Gaussian radiance fields in unknown scenarios by using a sequence of RGB-D images. First, by generating motion masks, we obtain static and dynamic priors for each pixel. To eliminate the influence of static scenes and improve the efficiency on learning the motion of dynamic objects, we classify the Gaussian primitives into static and dynamic Gaussian sets, while the sparse control points along with an MLP is utilized to model the transformation fields of the dynamic Gaussians. To more accurately learn the motion of dynamic Gaussians, a novel 2D optical flow map reconstruction algorithm is designed to render optical flows of dynamic objects between neighbor images, which are further used to supervise the 4D Gaussian radiance fields along with traditional photometric and geometric constraints. In experiments, qualitative and quantitative evaluation results show that the proposed method achieves robust tracking and high-quality view synthesis performance in real-world environments.
