Table of Contents
Fetching ...

Visibility-Aware Densification for 3D Gaussian Splatting in Dynamic Urban Scenes

Yikang Zhang, Rui Fan

TL;DR

This work tackles the difficulty of recovering complete geometry in 3D Gaussian splatting for dynamic urban scenes where view overlap is sparse. It introduces VAD-GS, a visibility-aware densification framework that leverages voxel-based visibility reasoning, diversity-aware view selection, and MVS-based reconstruction to recover missing structures and seed new Gaussians beyond the initial point cloud. The approach yields state-of-the-art rendering quality and more consistent geometry on Waymo Open and nuScenes, across static and dynamic objects, by enforcing multi-view geometric priors and selective densification. The results demonstrate practical gains for robust, cross-view synthesis in challenging urban environments, with plans to release the source code.

Abstract

3D Gaussian splatting (3DGS) has demonstrated impressive performance in synthesizing high-fidelity novel views. Nonetheless, its effectiveness critically depends on the quality of the initialized point cloud. Specifically, achieving uniform and complete point coverage over the underlying scene structure requires overlapping observation frustums, an assumption that is often violated in unbounded, dynamic urban environments. Training Gaussian models with partially initialized point clouds often leads to distortions and artifacts, as camera rays may fail to intersect valid surfaces, resulting in incorrect gradient propagation to Gaussian primitives associated with occluded or invisible geometry. Additionally, existing densification strategies simply clone and split Gaussian primitives from existing ones, incapable of reconstructing missing structures. To address these limitations, we propose VAD-GS, a 3DGS framework tailored for geometry recovery in challenging urban scenes. Our method identifies unreliable geometry structures via voxel-based visibility reasoning, selects informative supporting views through diversity-aware view selection, and recovers missing structures via patch matching-based multi-view stereo reconstruction. This design enables the generation of new Gaussian primitives guided by reliable geometric priors, even in regions lacking initial points. Extensive experiments on the Waymo and nuScenes datasets demonstrate that VAD-GS outperforms state-of-the-art 3DGS approaches and significantly improves the quality of reconstructed geometry for both static and dynamic objects. Source code will be released upon publication.

Visibility-Aware Densification for 3D Gaussian Splatting in Dynamic Urban Scenes

TL;DR

This work tackles the difficulty of recovering complete geometry in 3D Gaussian splatting for dynamic urban scenes where view overlap is sparse. It introduces VAD-GS, a visibility-aware densification framework that leverages voxel-based visibility reasoning, diversity-aware view selection, and MVS-based reconstruction to recover missing structures and seed new Gaussians beyond the initial point cloud. The approach yields state-of-the-art rendering quality and more consistent geometry on Waymo Open and nuScenes, across static and dynamic objects, by enforcing multi-view geometric priors and selective densification. The results demonstrate practical gains for robust, cross-view synthesis in challenging urban environments, with plans to release the source code.

Abstract

3D Gaussian splatting (3DGS) has demonstrated impressive performance in synthesizing high-fidelity novel views. Nonetheless, its effectiveness critically depends on the quality of the initialized point cloud. Specifically, achieving uniform and complete point coverage over the underlying scene structure requires overlapping observation frustums, an assumption that is often violated in unbounded, dynamic urban environments. Training Gaussian models with partially initialized point clouds often leads to distortions and artifacts, as camera rays may fail to intersect valid surfaces, resulting in incorrect gradient propagation to Gaussian primitives associated with occluded or invisible geometry. Additionally, existing densification strategies simply clone and split Gaussian primitives from existing ones, incapable of reconstructing missing structures. To address these limitations, we propose VAD-GS, a 3DGS framework tailored for geometry recovery in challenging urban scenes. Our method identifies unreliable geometry structures via voxel-based visibility reasoning, selects informative supporting views through diversity-aware view selection, and recovers missing structures via patch matching-based multi-view stereo reconstruction. This design enables the generation of new Gaussian primitives guided by reliable geometric priors, even in regions lacking initial points. Extensive experiments on the Waymo and nuScenes datasets demonstrate that VAD-GS outperforms state-of-the-art 3DGS approaches and significantly improves the quality of reconstructed geometry for both static and dynamic objects. Source code will be released upon publication.

Paper Structure

This paper contains 15 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: A comparison between VAD-GS and StreetGaussians. While both methods achieve comparable rendering quality, VAD-GS demonstrates superior recovery of incomplete or unreliable scene geometry, as evidenced by notable improvements in the rendered depth and normal maps.
  • Figure 2: VAD-GS pipeline. For each static or dynamic instance with incomplete geometry, VAD-GS first performs voxel-based visibility reasoning to identify a set of potential observation views. It then incrementally selects diverse supporting views to perform MVS reconstruction. The resulting geometric priors are subsequently used for Gaussian densification and optimization.
  • Figure 3: Voxel-based visibility reasoning. (a) Red points are visible, whereas blue points, captured from other views, are invisible in the reference view. (b) The invisibility of blue points may result from occlusions or insufficient sampling rays in the reference view. (c) Rasterizing the distances and indices of visible voxels (in green) yields dense depth maps and accurate pixel-voxel mapping.
  • Figure 4: View Selection and MVS Reconstruction. Image patches are warped across views to check the consistency of depth, normal, and color. Only consistently matched patches (in red) are considered valid for MVS reconstruction, while inconsistent ones (in blue) are discarded. The reconstructed geometry is then used to guide Gaussian densification.
  • Figure 5: Qualitative comparisons between VAD-GS and other SoTA approaches on the nuScenes dataset.
  • ...and 1 more figures