Table of Contents
Fetching ...

FastRSR: Efficient and Accurate Road Surface Reconstruction from Bird's Eye View

Yuting Zhao, Yuheng Ji, Xiaoshuai Hao, Shuxiao Li

TL;DR

The paper addresses the challenge of accurate and efficient road surface reconstruction (RSR) from Bird's Eye View (BEV) by introducing Depth-Aware Projection (DAP) to mitigate information loss in view transformations, and two BEV-based models, FastRSR-mono and FastRSR-stereo. It couples DAP with Shuttle-shape Discretization (SD) to generate dense, elevation-aware BEV features from monocular input, and augments stereo BEV with Spatial Attention Enhancement (SAE) and Confidence Attention Generation (CAG) to preserve speed while boosting accuracy. On the RSRD dataset, FastRSR-mono surpasses monocular baselines by over 6 percentage points in elevation Abs. err and FastRSR-stereo achieves at least a 3× speedup over existing stereo methods while attaining the lowest elevation error among BEV stereo models, demonstrating strong practical impact for real-time autonomous driving. The approach balances accuracy and efficiency, provides an end-to-end trainable framework with LiDAR-based supervision, and offers significant improvements over prior BEV RSR methods, establishing a new strong baseline for BEV-based road surface analysis. Key contributions include a fast, depth-guided 3D-to-2D projection (DAP) with a pre-computed look-up table, a nonuniform elevation discretization strategy (SD), and two attention-based refinements (SAE and CAG) that collectively enable accurate elevation reconstruction in BEV at real-time speeds. The results indicate substantial practical benefits for autonomous driving in terms of safety and comfort, enabling reliable road surface assessment in dynamic environments.

Abstract

Road Surface Reconstruction (RSR) is crucial for autonomous driving, enabling the understanding of road surface conditions. Recently, RSR from the Bird's Eye View (BEV) has gained attention for its potential to enhance performance. However, existing methods for transforming perspective views to BEV face challenges such as information loss and representation sparsity. Moreover, stereo matching in BEV is limited by the need to balance accuracy with inference speed. To address these challenges, we propose two efficient and accurate BEV-based RSR models: FastRSR-mono and FastRSR-stereo. Specifically, we first introduce Depth-Aware Projection (DAP), an efficient view transformation strategy designed to mitigate information loss and sparsity by querying depth and image features to aggregate BEV data within specific road surface regions using a pre-computed look-up table. To optimize accuracy and speed in stereo matching, we design the Spatial Attention Enhancement (SAE) and Confidence Attention Generation (CAG) modules. SAE adaptively highlights important regions, while CAG focuses on high-confidence predictions and filters out irrelevant information. FastRSR achieves state-of-the-art performance, exceeding monocular competitors by over 6.0% in elevation absolute error and providing at least a 3.0x speedup by stereo methods on the RSRD dataset. The source code will be released.

FastRSR: Efficient and Accurate Road Surface Reconstruction from Bird's Eye View

TL;DR

The paper addresses the challenge of accurate and efficient road surface reconstruction (RSR) from Bird's Eye View (BEV) by introducing Depth-Aware Projection (DAP) to mitigate information loss in view transformations, and two BEV-based models, FastRSR-mono and FastRSR-stereo. It couples DAP with Shuttle-shape Discretization (SD) to generate dense, elevation-aware BEV features from monocular input, and augments stereo BEV with Spatial Attention Enhancement (SAE) and Confidence Attention Generation (CAG) to preserve speed while boosting accuracy. On the RSRD dataset, FastRSR-mono surpasses monocular baselines by over 6 percentage points in elevation Abs. err and FastRSR-stereo achieves at least a 3× speedup over existing stereo methods while attaining the lowest elevation error among BEV stereo models, demonstrating strong practical impact for real-time autonomous driving. The approach balances accuracy and efficiency, provides an end-to-end trainable framework with LiDAR-based supervision, and offers significant improvements over prior BEV RSR methods, establishing a new strong baseline for BEV-based road surface analysis. Key contributions include a fast, depth-guided 3D-to-2D projection (DAP) with a pre-computed look-up table, a nonuniform elevation discretization strategy (SD), and two attention-based refinements (SAE and CAG) that collectively enable accurate elevation reconstruction in BEV at real-time speeds. The results indicate substantial practical benefits for autonomous driving in terms of safety and comfort, enabling reliable road surface assessment in dynamic environments.

Abstract

Road Surface Reconstruction (RSR) is crucial for autonomous driving, enabling the understanding of road surface conditions. Recently, RSR from the Bird's Eye View (BEV) has gained attention for its potential to enhance performance. However, existing methods for transforming perspective views to BEV face challenges such as information loss and representation sparsity. Moreover, stereo matching in BEV is limited by the need to balance accuracy with inference speed. To address these challenges, we propose two efficient and accurate BEV-based RSR models: FastRSR-mono and FastRSR-stereo. Specifically, we first introduce Depth-Aware Projection (DAP), an efficient view transformation strategy designed to mitigate information loss and sparsity by querying depth and image features to aggregate BEV data within specific road surface regions using a pre-computed look-up table. To optimize accuracy and speed in stereo matching, we design the Spatial Attention Enhancement (SAE) and Confidence Attention Generation (CAG) modules. SAE adaptively highlights important regions, while CAG focuses on high-confidence predictions and filters out irrelevant information. FastRSR achieves state-of-the-art performance, exceeding monocular competitors by over 6.0% in elevation absolute error and providing at least a 3.0x speedup by stereo methods on the RSRD dataset. The source code will be released.

Paper Structure

This paper contains 18 sections, 13 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Accuracy vs. Latency on the RSRD rsrd. Left: FastRSR-mono outperforms other state-of-the-art monocular depth estimation methods. Right: FastRSR-stereo surpasses published state-of-the-art stereo matching methods.
  • Figure 2: Overview of FastRSR-mono. Based on multi-scale image features, we employ Depth-Aware 3D-to-2D Projection (DAP) for multi-scale projection to extract the rich BEV feature. We then use the BEV encoder and a Softmax function to obtain the elevation distribution. Finally, the predicted elevation map is generated by calculating a linear combination of probability scores utilizing shuttle-shaped discretization bins.
  • Figure 3: Depth-Aware 3D-to-2D Projection Module. Despite occupying different elevations, voxels $b$ and $c$ share identical features along the same ray. With depth information to assign different weights, these voxels receive distinct values.
  • Figure 4: Discretization bin methods.
  • Figure 5: Overview of FastRSR-stereo. Following the paradigm of monocular BEV feature extraction, we first construct the initial group-wise correlation cost volume from the left and right voxel features. Subsequently, we introduce the Spatial Attention Enhancement (SAE) module to adaptively enhance important regions, while the Confidence Attention Generation (CAG) module emphasizes high-confidence predictions and suppresses irrelevant information.
  • ...and 4 more figures