BEV-GS: Feed-forward Gaussian Splatting in Bird's-Eye-View for Road Reconstruction

Wenhua Wu; Tong Zhao; Chensheng Peng; Lei Yang; Yintao Wei; Zhe Liu; Hesheng Wang

BEV-GS: Feed-forward Gaussian Splatting in Bird's-Eye-View for Road Reconstruction

Wenhua Wu, Tong Zhao, Chensheng Peng, Lei Yang, Yintao Wei, Zhe Liu, Hesheng Wang

TL;DR

BEV-GS tackles real-time road surface reconstruction from a single image by coupling a BEV-based feed-forward framework with grid-based Gaussian splatting. It decouples geometry and texture prediction into separate BEV branches and initializes a grid Gaussian representation for the road surface, enabling fast, differentiable novel-view synthesis without iterative optimization. On the real-world RSRD dataset, it achieves an elevation error of $1.73\,\mathrm{cm}$ and PSNR of $28.36\,\mathrm{dB}$ in rendering, while maintaining real-time inference at $26$ FPS and rapid rendering at about $2061$ FPS. The approach demonstrates that BEV-based geometry, texture decoupling, and grid Gaussians yield accurate, renderable road surfaces from a single image, with potential for online road condition previews and autonomous driving testing.

Abstract

Road surface is the sole contact medium for wheels or robot feet. Reconstructing road surface is crucial for unmanned vehicles and mobile robots. Recent studies on Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) have achieved remarkable results in scene reconstruction. However, they typically rely on multi-view image inputs and require prolonged optimization times. In this paper, we propose BEV-GS, a real-time single-frame road surface reconstruction method based on feed-forward Gaussian splatting. BEV-GS consists of a prediction module and a rendering module. The prediction module introduces separate geometry and texture networks following Bird's-Eye-View paradigm. Geometric and texture parameters are directly estimated from a single frame, avoiding per-scene optimization. In the rendering module, we utilize grid Gaussian for road surface representation and novel view synthesis, which better aligns with road surface characteristics. Our method achieves state-of-the-art performance on the real-world dataset RSRD. The road elevation error reduces to 1.73 cm, and the PSNR of novel view synthesis reaches 28.36 dB. The prediction and rendering FPS is 26, and 2061, respectively, enabling high-accuracy and real-time applications. The code will be available at: \href{https://github.com/cat-wwh/BEV-GS}{\texttt{https://github.com/cat-wwh/BEV-GS}}

BEV-GS: Feed-forward Gaussian Splatting in Bird's-Eye-View for Road Reconstruction

TL;DR

Abstract

BEV-GS: Feed-forward Gaussian Splatting in Bird's-Eye-View for Road Reconstruction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)