Table of Contents
Fetching ...

BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

Hang Wu, Zhenghao Zhang, Siyuan Lin, Tong Qin, Jin Pan, Qiang Zhao, Chunjing Xu, Ming Yang

TL;DR

BLOS-BEV tackles the limited perceptual range of BEV segmentation by fusing lightweight SD map priors with surround-view BEV features to achieve beyond line-of-sight understanding up to $200$m. The method integrates a BEV backbone (LSS), an SD map encoder, and a BEV fusion module that supports addition, concatenation, and cross-attention fusion, enabling effective multi-modal reasoning. Through extensive experiments on nuScenes and Argoverse, BLOS-BEV demonstrates state-of-the-art BEV segmentation at both short ($0\sim50$m) and long-range ($50\sim200$m) intervals, with long-range gains up to approximately $20\%$ mIoU. The approach provides a practical path to enhanced planning and safety in autonomous driving by leveraging geospatial priors from OpenStreetMap to extend perceptual horizons, while acknowledging localization and map-alignment limitations as avenues for future work.

Abstract

Bird's-eye-view (BEV) representation is crucial for the perception function in autonomous driving tasks. It is difficult to balance the accuracy, efficiency and range of BEV representation. The existing works are restricted to a limited perception range within 50 meters. Extending the BEV representation range can greatly benefit downstream tasks such as topology reasoning, scene understanding, and planning by offering more comprehensive information and reaction time. The Standard-Definition (SD) navigation maps can provide a lightweight representation of road structure topology, characterized by ease of acquisition and low maintenance costs. An intuitive idea is to combine the close-range visual information from onboard cameras with the beyond line-of-sight (BLOS) environmental priors from SD maps to realize expanded perceptual capabilities. In this paper, we propose BLOS-BEV, a novel BEV segmentation model that incorporates SD maps for accurate beyond line-of-sight perception, up to 200m. Our approach is applicable to common BEV architectures and can achieve excellent results by incorporating information derived from SD maps. We explore various feature fusion schemes to effectively integrate the visual BEV representations and semantic features from the SD map, aiming to leverage the complementary information from both sources optimally. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in BEV segmentation on nuScenes and Argoverse benchmark. Through multi-modal inputs, BEV segmentation is significantly enhanced at close ranges below 50m, while also demonstrating superior performance in long-range scenarios, surpassing other methods by over 20% mIoU at distances ranging from 50-200m.

BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

TL;DR

BLOS-BEV tackles the limited perceptual range of BEV segmentation by fusing lightweight SD map priors with surround-view BEV features to achieve beyond line-of-sight understanding up to m. The method integrates a BEV backbone (LSS), an SD map encoder, and a BEV fusion module that supports addition, concatenation, and cross-attention fusion, enabling effective multi-modal reasoning. Through extensive experiments on nuScenes and Argoverse, BLOS-BEV demonstrates state-of-the-art BEV segmentation at both short (m) and long-range (m) intervals, with long-range gains up to approximately mIoU. The approach provides a practical path to enhanced planning and safety in autonomous driving by leveraging geospatial priors from OpenStreetMap to extend perceptual horizons, while acknowledging localization and map-alignment limitations as avenues for future work.

Abstract

Bird's-eye-view (BEV) representation is crucial for the perception function in autonomous driving tasks. It is difficult to balance the accuracy, efficiency and range of BEV representation. The existing works are restricted to a limited perception range within 50 meters. Extending the BEV representation range can greatly benefit downstream tasks such as topology reasoning, scene understanding, and planning by offering more comprehensive information and reaction time. The Standard-Definition (SD) navigation maps can provide a lightweight representation of road structure topology, characterized by ease of acquisition and low maintenance costs. An intuitive idea is to combine the close-range visual information from onboard cameras with the beyond line-of-sight (BLOS) environmental priors from SD maps to realize expanded perceptual capabilities. In this paper, we propose BLOS-BEV, a novel BEV segmentation model that incorporates SD maps for accurate beyond line-of-sight perception, up to 200m. Our approach is applicable to common BEV architectures and can achieve excellent results by incorporating information derived from SD maps. We explore various feature fusion schemes to effectively integrate the visual BEV representations and semantic features from the SD map, aiming to leverage the complementary information from both sources optimally. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in BEV segmentation on nuScenes and Argoverse benchmark. Through multi-modal inputs, BEV segmentation is significantly enhanced at close ranges below 50m, while also demonstrating superior performance in long-range scenarios, surpassing other methods by over 20% mIoU at distances ranging from 50-200m.
Paper Structure (19 sections, 4 equations, 7 figures, 4 tables)

This paper contains 19 sections, 4 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The BLOS-BEV architecture. BLOS-BEV effectively integrates the complementary information from surround-view images and SD maps. By fusing visual information and geometrical priors, BLOS-BEV produces BEV semantic segmentation that far exceeds the range of previous methods, enabling extended-range scene parsing critical for safe autonomous driving. The video demonstration can be found at: https://youtu.be/dPP0_mCzek4.
  • Figure 2: Pipeline of the BLOS-BEV model. The surround-view camera images from the ego vehicle along with a rasterized SD map are fed as inputs. The SD map provides the key road topology. BLOS-BEV effectively fuses the visual features and map encodings through a BEV fusion module. By integrating complementary information from images and maps, BLOS-BEV produces beyond line-of-sight BEV segmentation that substantially exceeds the range of previous methods.
  • Figure 3: Comparison of original and rasterized SD maps. The rasterization retains only the key road layout, reducing clutter while providing the essential environmental context for BEV scene understanding. This demonstrates our map preprocessing and rasterization approach to generate a clean topological representation as input to SD Map Encoder.
  • Figure 4: Alternative techniques explored for fusing BEV features and SD map representations in BLOS-BEV. (a) Element-wise addition of BEV and map encodings. (b) Concatenation of BEV and map features along channel dimension, followed by $3\times3$ convolutions to reduce channels. (c) Cross-attention mechanism where map encodings query visual BEV features.
  • Figure 5: Projection of a nuScenes data onto aligned SD map coordinates, visualized for a local area. The lane and road segment annotations from one nuScenes sequence are transformed and visualized on the SD map.
  • ...and 2 more figures