Table of Contents
Fetching ...

D$^2$GS: Dense Depth Regularization for LiDAR-free Urban Scene Reconstruction

Kejing Xia, Jidong Jia, Ke Jin, Yucai Bai, Li Sun, Dacheng Tao, Youjian Zhang

TL;DR

D$^2$GS tackles the practical challenge of LiDAR dependency in urban scene reconstruction by proposing a LiDAR-free pipeline that fuses dense depth priors from multi-view predictions with a diffusion-guided Depth Enhancer and a Road Node for robust ground-plane geometry. The method initializes a compact Gaussian representation through Progressive Pruning, then jointly refines depth and Gaussian geometry using diffusion priors and multi-view consistency, while explicitly regularizing road regions. Extensive experiments on Waymo NOTR Dynamic32 show that D$^2$GS achieves superior image reconstruction and significantly improves dense depth accuracy (notably around a $53\%$ gain over depth-regularized baselines) compared with LiDAR-free methods and is competitive with LiDAR-supervised approaches. The results underscore the practicality of LiDAR-free urban reconstruction and outline future work toward pose-free pipelines to further broaden applicability.

Abstract

Recently, Gaussian Splatting (GS) has shown great potential for urban scene reconstruction in the field of autonomous driving. However, current urban scene reconstruction methods often depend on multimodal sensors as inputs, \textit{i.e.} LiDAR and images. Though the geometry prior provided by LiDAR point clouds can largely mitigate ill-posedness in reconstruction, acquiring such accurate LiDAR data is still challenging in practice: i) precise spatiotemporal calibration between LiDAR and other sensors is required, as they may not capture data simultaneously; ii) reprojection errors arise from spatial misalignment when LiDAR and cameras are mounted at different locations. To avoid the difficulty of acquiring accurate LiDAR depth, we propose D$^2$GS, a LiDAR-free urban scene reconstruction framework. In this work, we obtain geometry priors that are as effective as LiDAR while being denser and more accurate. $\textbf{First}$, we initialize a dense point cloud by back-projecting multi-view metric depth predictions. This point cloud is then optimized by a Progressive Pruning strategy to improve the global consistency. $\textbf{Second}$, we jointly refine Gaussian geometry and predicted dense metric depth via a Depth Enhancer. Specifically, we leverage diffusion priors from a depth foundation model to enhance the depth maps rendered by Gaussians. In turn, the enhanced depths provide stronger geometric constraints during Gaussian training. $\textbf{Finally}$, we improve the accuracy of ground geometry by constraining the shape and normal attributes of Gaussians within road regions. Extensive experiments on the Waymo dataset demonstrate that our method consistently outperforms state-of-the-art methods, producing more accurate geometry even when compared with those using ground-truth LiDAR data.

D$^2$GS: Dense Depth Regularization for LiDAR-free Urban Scene Reconstruction

TL;DR

DGS tackles the practical challenge of LiDAR dependency in urban scene reconstruction by proposing a LiDAR-free pipeline that fuses dense depth priors from multi-view predictions with a diffusion-guided Depth Enhancer and a Road Node for robust ground-plane geometry. The method initializes a compact Gaussian representation through Progressive Pruning, then jointly refines depth and Gaussian geometry using diffusion priors and multi-view consistency, while explicitly regularizing road regions. Extensive experiments on Waymo NOTR Dynamic32 show that DGS achieves superior image reconstruction and significantly improves dense depth accuracy (notably around a gain over depth-regularized baselines) compared with LiDAR-free methods and is competitive with LiDAR-supervised approaches. The results underscore the practicality of LiDAR-free urban reconstruction and outline future work toward pose-free pipelines to further broaden applicability.

Abstract

Recently, Gaussian Splatting (GS) has shown great potential for urban scene reconstruction in the field of autonomous driving. However, current urban scene reconstruction methods often depend on multimodal sensors as inputs, \textit{i.e.} LiDAR and images. Though the geometry prior provided by LiDAR point clouds can largely mitigate ill-posedness in reconstruction, acquiring such accurate LiDAR data is still challenging in practice: i) precise spatiotemporal calibration between LiDAR and other sensors is required, as they may not capture data simultaneously; ii) reprojection errors arise from spatial misalignment when LiDAR and cameras are mounted at different locations. To avoid the difficulty of acquiring accurate LiDAR depth, we propose DGS, a LiDAR-free urban scene reconstruction framework. In this work, we obtain geometry priors that are as effective as LiDAR while being denser and more accurate. , we initialize a dense point cloud by back-projecting multi-view metric depth predictions. This point cloud is then optimized by a Progressive Pruning strategy to improve the global consistency. , we jointly refine Gaussian geometry and predicted dense metric depth via a Depth Enhancer. Specifically, we leverage diffusion priors from a depth foundation model to enhance the depth maps rendered by Gaussians. In turn, the enhanced depths provide stronger geometric constraints during Gaussian training. , we improve the accuracy of ground geometry by constraining the shape and normal attributes of Gaussians within road regions. Extensive experiments on the Waymo dataset demonstrate that our method consistently outperforms state-of-the-art methods, producing more accurate geometry even when compared with those using ground-truth LiDAR data.

Paper Structure

This paper contains 30 sections, 9 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Examples of common LiDAR acquisition issues: a) reprojection error, b) calibration misalignments, and c) LiDAR missing problem. OmniRe chen2024omnire yields poor performance in both image reconstruction and depth estimation when using these inaccurate LiDAR measurements.
  • Figure 2: Pipeline of D$^2$GS. We first employ a Progressive Pruning strategy to obtain a robust global Gaussian initialization. A Road Node is incorporated into the scene graph structure to regularize the road region using strong geometric priors. During training, Gaussian optimization and depth refinement are performed iteratively, allowing depth to be learned jointly from Gaussian supervision and enhanced by diffusion priors from a pretrained depth foundation model.
  • Figure 3: Comparison of image reconstruction and depth estimation performances on Waymo NOTR Dynamic32 Dataset with S3GS huang2024textit, PVG chen2023periodic, OmniRe chen2024omnire, and LiDAR-free baselines. Zoom in for better visual comparison.
  • Figure 4: Visualization comparison of the proposed modules and training strategy. Zoom in for better comparison.
  • Figure 5: Comparison of our method with S3GS, PVG, OmniRe, and LiDAR-free baselines.
  • ...and 8 more figures