Table of Contents
Fetching ...

SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

Su Sun, Cheng Zhao, Zhuoyang Sun, Yingjie Victor Chen, Mei Chen

TL;DR

SplatFlow tackles dynamic urban scene reconstruction without expensive 3D bounding-box annotations by introducing Self-Supervised Dynamic Gaussian Splatting within Neural Motion Flow Fields (NMFF). It represents static background with 3D Gaussians and dynamic objects with 4D Gaussians, while NMFF jointly models temporal correspondences of Gaussians and LiDAR points to enforce cross-view consistency. A LiDAR-based motion prior is learned via NMFF pretraining, and optical-flow distillation from 2D foundation models further enhances dynamic-object identification. Across Waymo Open and KITTI, SplatFlow achieves state-of-the-art performance for image reconstruction and novel-view synthesis, while operating without tracked 3D bounding boxes and delivering real-time rendering speeds, highlighting its practical potential for scalable autonomous driving workflows.

Abstract

Most existing Dynamic Gaussian Splatting methods for complex dynamic urban scenarios rely on accurate object-level supervision from expensive manual labeling, limiting their scalability in real-world applications. In this paper, we introduce SplatFlow, a Self-Supervised Dynamic Gaussian Splatting within Neural Motion Flow Fields (NMFF) to learn 4D space-time representations without requiring tracked 3D bounding boxes, enabling accurate dynamic scene reconstruction and novel view RGB/depth/flow synthesis. SplatFlow designs a unified framework to seamlessly integrate time-dependent 4D Gaussian representation within NMFF, where NMFF is a set of implicit functions to model temporal motions of both LiDAR points and Gaussians as continuous motion flow fields. Leveraging NMFF, SplatFlow effectively decomposes static background and dynamic objects, representing them with 3D and 4D Gaussian primitives, respectively. NMFF also models the correspondences of each 4D Gaussian across time, which aggregates temporal features to enhance cross-view consistency of dynamic components. SplatFlow further improves dynamic object identification by distilling features from 2D foundation models into 4D space-time representation. Comprehensive evaluations conducted on the Waymo and KITTI Datasets validate SplatFlow's state-of-the-art (SOTA) performance for both image reconstruction and novel view synthesis in dynamic urban scenarios.

SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

TL;DR

SplatFlow tackles dynamic urban scene reconstruction without expensive 3D bounding-box annotations by introducing Self-Supervised Dynamic Gaussian Splatting within Neural Motion Flow Fields (NMFF). It represents static background with 3D Gaussians and dynamic objects with 4D Gaussians, while NMFF jointly models temporal correspondences of Gaussians and LiDAR points to enforce cross-view consistency. A LiDAR-based motion prior is learned via NMFF pretraining, and optical-flow distillation from 2D foundation models further enhances dynamic-object identification. Across Waymo Open and KITTI, SplatFlow achieves state-of-the-art performance for image reconstruction and novel-view synthesis, while operating without tracked 3D bounding boxes and delivering real-time rendering speeds, highlighting its practical potential for scalable autonomous driving workflows.

Abstract

Most existing Dynamic Gaussian Splatting methods for complex dynamic urban scenarios rely on accurate object-level supervision from expensive manual labeling, limiting their scalability in real-world applications. In this paper, we introduce SplatFlow, a Self-Supervised Dynamic Gaussian Splatting within Neural Motion Flow Fields (NMFF) to learn 4D space-time representations without requiring tracked 3D bounding boxes, enabling accurate dynamic scene reconstruction and novel view RGB/depth/flow synthesis. SplatFlow designs a unified framework to seamlessly integrate time-dependent 4D Gaussian representation within NMFF, where NMFF is a set of implicit functions to model temporal motions of both LiDAR points and Gaussians as continuous motion flow fields. Leveraging NMFF, SplatFlow effectively decomposes static background and dynamic objects, representing them with 3D and 4D Gaussian primitives, respectively. NMFF also models the correspondences of each 4D Gaussian across time, which aggregates temporal features to enhance cross-view consistency of dynamic components. SplatFlow further improves dynamic object identification by distilling features from 2D foundation models into 4D space-time representation. Comprehensive evaluations conducted on the Waymo and KITTI Datasets validate SplatFlow's state-of-the-art (SOTA) performance for both image reconstruction and novel view synthesis in dynamic urban scenarios.

Paper Structure

This paper contains 23 sections, 17 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Top: Street GS yan2024street; Middle: PVG chen2023periodic; Bottom: Our SplatFlow. SplatFlow eliminates the need for 3D Bboxes required by Street GS, and enhances rendering quality compared to PVG.
  • Figure 2: The pipeline of SplatFlow.
  • Figure 3: Visualization of 3D LiDAR points within NMFF on Waymo dataset.
  • Figure 4: Visual comparison of novel view synthesis on Waymo dataset. Bounding boxes indicate the zoomed-in dynamic areas.
  • Figure 5: Dynamic object decomposition comparison on Waymo.
  • ...and 12 more figures