VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors
Sungwon Hwang, Min-Jung Kim, Taewoong Kang, Jayeon Kang, Jaegul Choo
TL;DR
This work tackles Extrapolated View Synthesis (EVS) for urban scenes captured with forward-facing driving cameras. It introduces VEGS, a method that integrates dense LiDAR-based initialization, 3D Gaussian Splatting with dynamic objects, and two priors—surface normals and a fine-tuned diffusion model—to improve rendering quality on views outside the training distribution. The approach includes a covariance-guided regularization to prevent cavities and a diffusion-score distillation to inject scene-specific visual priors, yielding improved EVS metrics and coherent scene editing capabilities. The results on KITTI datasets demonstrate robust EVS performance, highlighting the method's potential for real-time, view-consistent urban scene rendering in applications like autonomous driving and AR/VR visualization.
Abstract
Neural rendering-based urban scene reconstruction methods commonly rely on images collected from driving vehicles with cameras facing and moving forward. Although these methods can successfully synthesize from views similar to training camera trajectory, directing the novel view outside the training camera distribution does not guarantee on-par performance. In this paper, we tackle the Extrapolated View Synthesis (EVS) problem by evaluating the reconstructions on views such as looking left, right or downwards with respect to training camera distributions. To improve rendering quality for EVS, we initialize our model by constructing dense LiDAR map, and propose to leverage prior scene knowledge such as surface normal estimator and large-scale diffusion model. Qualitative and quantitative comparisons demonstrate the effectiveness of our methods on EVS. To the best of our knowledge, we are the first to address the EVS problem in urban scene reconstruction. Link to our project page: https://vegs3d.github.io/.
