D$^2$GS: Dense Depth Regularization for LiDAR-free Urban Scene Reconstruction
Kejing Xia, Jidong Jia, Ke Jin, Yucai Bai, Li Sun, Dacheng Tao, Youjian Zhang
TL;DR
D$^2$GS tackles the practical challenge of LiDAR dependency in urban scene reconstruction by proposing a LiDAR-free pipeline that fuses dense depth priors from multi-view predictions with a diffusion-guided Depth Enhancer and a Road Node for robust ground-plane geometry. The method initializes a compact Gaussian representation through Progressive Pruning, then jointly refines depth and Gaussian geometry using diffusion priors and multi-view consistency, while explicitly regularizing road regions. Extensive experiments on Waymo NOTR Dynamic32 show that D$^2$GS achieves superior image reconstruction and significantly improves dense depth accuracy (notably around a $53\%$ gain over depth-regularized baselines) compared with LiDAR-free methods and is competitive with LiDAR-supervised approaches. The results underscore the practicality of LiDAR-free urban reconstruction and outline future work toward pose-free pipelines to further broaden applicability.
Abstract
Recently, Gaussian Splatting (GS) has shown great potential for urban scene reconstruction in the field of autonomous driving. However, current urban scene reconstruction methods often depend on multimodal sensors as inputs, \textit{i.e.} LiDAR and images. Though the geometry prior provided by LiDAR point clouds can largely mitigate ill-posedness in reconstruction, acquiring such accurate LiDAR data is still challenging in practice: i) precise spatiotemporal calibration between LiDAR and other sensors is required, as they may not capture data simultaneously; ii) reprojection errors arise from spatial misalignment when LiDAR and cameras are mounted at different locations. To avoid the difficulty of acquiring accurate LiDAR depth, we propose D$^2$GS, a LiDAR-free urban scene reconstruction framework. In this work, we obtain geometry priors that are as effective as LiDAR while being denser and more accurate. $\textbf{First}$, we initialize a dense point cloud by back-projecting multi-view metric depth predictions. This point cloud is then optimized by a Progressive Pruning strategy to improve the global consistency. $\textbf{Second}$, we jointly refine Gaussian geometry and predicted dense metric depth via a Depth Enhancer. Specifically, we leverage diffusion priors from a depth foundation model to enhance the depth maps rendered by Gaussians. In turn, the enhanced depths provide stronger geometric constraints during Gaussian training. $\textbf{Finally}$, we improve the accuracy of ground geometry by constraining the shape and normal attributes of Gaussians within road regions. Extensive experiments on the Waymo dataset demonstrate that our method consistently outperforms state-of-the-art methods, producing more accurate geometry even when compared with those using ground-truth LiDAR data.
