Table of Contents
Fetching ...

DriveSplat: Unified Neural Gaussian Reconstruction for Dynamic Driving Scenes

Cong Wang, Ruiqi Song, Wei Tian, Chenming Zhang, Lingxi Li, Long Chen

Abstract

Reconstructing large-scale dynamic driving scenes remains challenging due to the coexistence of static environments with extreme depth variation and diverse dynamic actors exhibiting complex motions. Existing Gaussian Splatting based methods have primarily focused on limited-scale or object-centric settings, and their applicability to large-scale dynamic driving scenes remains underexplored, particularly in the presence of extreme scale variation and non-rigid motions. In this work, we propose DriveSplat, a unified neural Gaussian framework for reconstructing dynamic driving scenes within a unified Gaussian-based representation. For static backgrounds, we introduce a scene-aware learnable level-of-detail (LOD) modeling strategy that explicitly accounts for near, intermediate, and far depth ranges in driving environments, enabling adaptive multi-scale Gaussian allocation. For dynamic actors, we use an object-centric formulation with neural Gaussian primitives, modeling motion through a global rigid transformation and handling non-rigid dynamics via a two-stage deformation that first adjusts anchors and subsequently updates the Gaussians. To further regularize the optimization, we incorporate dense depth and surface normal priors from pre-trained models as auxiliary supervision. Extensive experiments on the Waymo and KITTI benchmarks demonstrate that DriveSplat achieves state-of-the-art performance in novel-view synthesis while producing temporally stable and geometrically consistent reconstructions of dynamic driving scenes. Project page: https://physwm.github.io/drivesplat.

DriveSplat: Unified Neural Gaussian Reconstruction for Dynamic Driving Scenes

Abstract

Reconstructing large-scale dynamic driving scenes remains challenging due to the coexistence of static environments with extreme depth variation and diverse dynamic actors exhibiting complex motions. Existing Gaussian Splatting based methods have primarily focused on limited-scale or object-centric settings, and their applicability to large-scale dynamic driving scenes remains underexplored, particularly in the presence of extreme scale variation and non-rigid motions. In this work, we propose DriveSplat, a unified neural Gaussian framework for reconstructing dynamic driving scenes within a unified Gaussian-based representation. For static backgrounds, we introduce a scene-aware learnable level-of-detail (LOD) modeling strategy that explicitly accounts for near, intermediate, and far depth ranges in driving environments, enabling adaptive multi-scale Gaussian allocation. For dynamic actors, we use an object-centric formulation with neural Gaussian primitives, modeling motion through a global rigid transformation and handling non-rigid dynamics via a two-stage deformation that first adjusts anchors and subsequently updates the Gaussians. To further regularize the optimization, we incorporate dense depth and surface normal priors from pre-trained models as auxiliary supervision. Extensive experiments on the Waymo and KITTI benchmarks demonstrate that DriveSplat achieves state-of-the-art performance in novel-view synthesis while producing temporally stable and geometrically consistent reconstructions of dynamic driving scenes. Project page: https://physwm.github.io/drivesplat.

Paper Structure

This paper contains 40 sections, 18 equations, 16 figures, 10 tables.

Figures (16)

  • Figure 1: Comparison with StreetGS. StreetGS generates an excessive number of redundant Gaussians (yellow circles) in the reconstructed 3D scene. The right panel presents the rendered image, depth and normal map from a novel view (red dots).
  • Figure 2: Overall pipeline of DriveSplat. A dynamic-static decoupling paradigm is adopted, where neural Gaussian representations with partitioned voxel structures are applied for background reconstruction, while a deformation field network models the temporal dynamics of each non-rigid actor. Depth maps and normal priors are incorporated to enhance geometric accuracy.
  • Figure 3: Overview of the proposed multi-scale background representation and view-adaptive LOD allocation. (a) The static background is modeled using a multi-scale Gaussian representation. (b) Geometry-guided near/mid/far regions provide structural priors, while the effective LOD of each anchor is dynamically selected based on the current camera viewpoint.
  • Figure 4: Visualization of voxel representation at different levels. As the level increases, the voxel resolution gradually improves. The corresponding partitioned neural Gaussians are shown in the bottom row.
  • Figure 5: Pipeline for dynamic non-rigid actor modeling. Our two-stage pipeline first estimates anchor-level motion and then updates the corresponding neural Gaussian parameters.
  • ...and 11 more figures