RoadBEV: Road Surface Reconstruction in Bird's Eye View

Tong Zhao; Lei Yang; Yichen Xie; Mingyu Ding; Masayoshi Tomizuka; Yintao Wei

RoadBEV: Road Surface Reconstruction in Bird's Eye View

Tong Zhao, Lei Yang, Yichen Xie, Mingyu Ding, Masayoshi Tomizuka, Yintao Wei

TL;DR

This paper addresses road surface elevation reconstruction for autonomous driving by proposing BEV-based methods that estimate vertical road profiles directly in Bird's Eye View. It introduces two models, RoadBEV-mono and RoadBEV-stereo, which query image features through voxelized BEV representations and perform elevation estimation as bin-based classification, using a soft-argmin to obtain continuous elevations. On the RS RD dataset, RoadBEV-mono achieves about $1.83\,\text{cm}$ absolute error, while RoadBEV-stereo reaches about $0.50\,\text{cm}$, with stereo offering substantial accuracy gains at the cost of higher computation. The approach leverages a voxel-centric BEV volume and correlation-based cost volumes to suppress perspective distortions and tightly constrain elevation estimation, demonstrating practical viability for road preview in autonomous systems and opening avenues for sequence-based and texture-geometry joint reconstructions.

Abstract

Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance. Existing solutions like monocular depth estimation and stereo matching suffer from modest performance. The recent technique of Bird's-Eye-View (BEV) perception provides immense potential to more reliable and accurate reconstruction. This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo, which estimate road elevation with monocular and stereo images, respectively. The former directly fits elevation values based on voxel features queried from image view, while the latter efficiently recognizes road elevation patterns based on BEV volume representing correlation between left and right voxel features. Insightful analyses reveal their consistence and difference with the perspective view. Experiments on real-world dataset verify the models' effectiveness and superiority. Elevation errors of RoadBEV-mono and RoadBEV-stereo achieve 1.83 cm and 0.50 cm, respectively. Our models are promising for practical road preview, providing essential information for promoting safety and comfort of autonomous vehicles. The code is released at https://github.com/ztsrxh/RoadBEV

RoadBEV: Road Surface Reconstruction in Bird's Eye View

TL;DR

absolute error, while RoadBEV-stereo reaches about

, with stereo offering substantial accuracy gains at the cost of higher computation. The approach leverages a voxel-centric BEV volume and correlation-based cost volumes to suppress perspective distortions and tightly constrain elevation estimation, demonstrating practical viability for road preview in autonomous systems and opening avenues for sequence-based and texture-geometry joint reconstructions.

Abstract

Paper Structure (16 sections, 3 equations, 12 figures, 4 tables)

This paper contains 16 sections, 3 equations, 12 figures, 4 tables.

Introduction
Related Works
Dataset and Pre-processing
Methods
Feature Voxel and Elevation Regression
RoadBEV-mono
RoadBEV-stereo
Loss Functions
Experiments
Implementation Details
Performance and Comparison
Visualization of Road Reconstruction
Ablation Studies for RoadBEV-mono
Ablation Studies for RoadBEV-stereo
Limitations and Prospects
...and 1 more sections

Figures (12)

Figure 1: Our motivation. (a) Our reconstruction methods in BEV outperform these in image view for both monocular and stereo configurations. (b) For depth estimation in image view, the searching direction is biased from road elevation direction. Road profile features are sparse in depth view. The pothole is not clearly identifiable. (c) In BEV, profile vibrations are precisely captured such as the pothole, roadside step and even the rut. Road elevation feature in vertical direction is denser and easier to be recognized.
Figure 2: Illustration of coordinates and the generation of GT elevation labels. (a) Coordinates. (b) ROI in image view. (c) ROI in BEV. (d) Generation of GT labels in grids.
Figure 3: Examples of road image and GT elevation map. The unit of colorbar is cm.
Figure 4: Feature voxels of interest in image view. Centers of stacked voxels at the same horizontal location are projected as pixels on the red line segment.
Figure 5: Architecture of RoadBEV-mono. We utilize 3D to 2D projection to query pixel features. The elevation estimation head utilizes 2D convolution to extract features on the reshaped BEV feature.
...and 7 more figures

RoadBEV: Road Surface Reconstruction in Bird's Eye View

TL;DR

Abstract

RoadBEV: Road Surface Reconstruction in Bird's Eye View

Authors

TL;DR

Abstract

Table of Contents

Figures (12)