Table of Contents
Fetching ...

BEV-GS: Feed-forward Gaussian Splatting in Bird's-Eye-View for Road Reconstruction

Wenhua Wu, Tong Zhao, Chensheng Peng, Lei Yang, Yintao Wei, Zhe Liu, Hesheng Wang

TL;DR

BEV-GS tackles real-time road surface reconstruction from a single image by coupling a BEV-based feed-forward framework with grid-based Gaussian splatting. It decouples geometry and texture prediction into separate BEV branches and initializes a grid Gaussian representation for the road surface, enabling fast, differentiable novel-view synthesis without iterative optimization. On the real-world RSRD dataset, it achieves an elevation error of $1.73\,\mathrm{cm}$ and PSNR of $28.36\,\mathrm{dB}$ in rendering, while maintaining real-time inference at $26$ FPS and rapid rendering at about $2061$ FPS. The approach demonstrates that BEV-based geometry, texture decoupling, and grid Gaussians yield accurate, renderable road surfaces from a single image, with potential for online road condition previews and autonomous driving testing.

Abstract

Road surface is the sole contact medium for wheels or robot feet. Reconstructing road surface is crucial for unmanned vehicles and mobile robots. Recent studies on Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) have achieved remarkable results in scene reconstruction. However, they typically rely on multi-view image inputs and require prolonged optimization times. In this paper, we propose BEV-GS, a real-time single-frame road surface reconstruction method based on feed-forward Gaussian splatting. BEV-GS consists of a prediction module and a rendering module. The prediction module introduces separate geometry and texture networks following Bird's-Eye-View paradigm. Geometric and texture parameters are directly estimated from a single frame, avoiding per-scene optimization. In the rendering module, we utilize grid Gaussian for road surface representation and novel view synthesis, which better aligns with road surface characteristics. Our method achieves state-of-the-art performance on the real-world dataset RSRD. The road elevation error reduces to 1.73 cm, and the PSNR of novel view synthesis reaches 28.36 dB. The prediction and rendering FPS is 26, and 2061, respectively, enabling high-accuracy and real-time applications. The code will be available at: \href{https://github.com/cat-wwh/BEV-GS}{\texttt{https://github.com/cat-wwh/BEV-GS}}

BEV-GS: Feed-forward Gaussian Splatting in Bird's-Eye-View for Road Reconstruction

TL;DR

BEV-GS tackles real-time road surface reconstruction from a single image by coupling a BEV-based feed-forward framework with grid-based Gaussian splatting. It decouples geometry and texture prediction into separate BEV branches and initializes a grid Gaussian representation for the road surface, enabling fast, differentiable novel-view synthesis without iterative optimization. On the real-world RSRD dataset, it achieves an elevation error of and PSNR of in rendering, while maintaining real-time inference at FPS and rapid rendering at about FPS. The approach demonstrates that BEV-based geometry, texture decoupling, and grid Gaussians yield accurate, renderable road surfaces from a single image, with potential for online road condition previews and autonomous driving testing.

Abstract

Road surface is the sole contact medium for wheels or robot feet. Reconstructing road surface is crucial for unmanned vehicles and mobile robots. Recent studies on Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) have achieved remarkable results in scene reconstruction. However, they typically rely on multi-view image inputs and require prolonged optimization times. In this paper, we propose BEV-GS, a real-time single-frame road surface reconstruction method based on feed-forward Gaussian splatting. BEV-GS consists of a prediction module and a rendering module. The prediction module introduces separate geometry and texture networks following Bird's-Eye-View paradigm. Geometric and texture parameters are directly estimated from a single frame, avoiding per-scene optimization. In the rendering module, we utilize grid Gaussian for road surface representation and novel view synthesis, which better aligns with road surface characteristics. Our method achieves state-of-the-art performance on the real-world dataset RSRD. The road elevation error reduces to 1.73 cm, and the PSNR of novel view synthesis reaches 28.36 dB. The prediction and rendering FPS is 26, and 2061, respectively, enabling high-accuracy and real-time applications. The code will be available at: \href{https://github.com/cat-wwh/BEV-GS}{\texttt{https://github.com/cat-wwh/BEV-GS}}

Paper Structure

This paper contains 20 sections, 13 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: BEV-GS is capable of real-time road surface reconstruction from a single image. Unlike existing per-pixel feed-forward Gaussian prediction methods, it predicts geometry and texture parameters of mesh grid Gaussians based on BEV features. BEV-GS achieves remarkable road surface reconstruction performance and can rapidly render novel views of the road. The images on the right side simulate the captured road as the car moves forward.
  • Figure 2: Overview of BEV-GS. It consists of a prediction module and a rendering module. In the prediction module, two independent geometry and texture branches based on BEV features predict road elevation and spherical harmonic parameters. In the rendering module, we represent road surface with grid Gaussians, which are initialized by the predicted properties. Elevation and color losses separately supervise the feed-forward predictions.
  • Figure 3: Visualization of road reconstruction results under typical scenarios. Compared with RoadBEV roadbev, our method not only reconstructs more accurate road geometry but also the detailed road texture.
  • Figure 4: Visualization of novel view synthesis, which is the next frame as the vehicle moves forward. For comparison purposes, the render results of Splatter Image szymanowicz2024splatter_image and Flash3D szymanowicz2024flash3d are restricted to the same area as ours.
  • Figure 5: Comparison of segment-wise elevation errors with the SOTA depth estimation and BEV models.
  • ...and 4 more figures