U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization
Andrea Boscolo Camiletto, Alfredo Bochicchio, Alexander Liniger, Dengxin Dai, Abel Gawel
TL;DR
The paper tackles the challenge of reliable image-based relocalization under GPS-denied conditions by introducing U-BEV, a height-aware BEV representation that reasons over multiple height layers before BEV fusion. It couples a lightweight, multi-height encoder–decoder BEV with a neural map encoding of SD-map data and a differentiable template matcher for end-to-end relocalization. U-BEV achieves IoU gains of approximately $1.7$ to $2.8$ over a strong BEV baseline and improves Recall Accuracy at a $10\mathrm{ m}$ threshold by about $26.4\%$ on nuScenes, while maintaining real-time inference with reduced computational load. This approach enables robust relocalization in feature-poor or degenerate environments by leveraging road-shape structure over distinct landmarks, with practical applicability to lightweight autonomous driving systems.
Abstract
Efficient relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails. Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance and in turn, can benefit the relocalization of the vehicle. However, one downside of BEV methods is the heavy computation required to leverage the geometric constraints. This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features. We show that this extension boosts the performance of the U-BEV by up to 4.11 IoU. Additionally, we combine the encoded neural BEV with a differentiable template matcher to perform relocalization on neural SD-map data. The model is fully end-to-end trainable and outperforms transformer-based BEV methods of similar computational complexity by 1.7 to 2.8 mIoU and BEV-based relocalization by over 26% Recall Accuracy on the nuScenes dataset.
