BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight
Hang Wu, Zhenghao Zhang, Siyuan Lin, Tong Qin, Jin Pan, Qiang Zhao, Chunjing Xu, Ming Yang
TL;DR
BLOS-BEV tackles the limited perceptual range of BEV segmentation by fusing lightweight SD map priors with surround-view BEV features to achieve beyond line-of-sight understanding up to $200$m. The method integrates a BEV backbone (LSS), an SD map encoder, and a BEV fusion module that supports addition, concatenation, and cross-attention fusion, enabling effective multi-modal reasoning. Through extensive experiments on nuScenes and Argoverse, BLOS-BEV demonstrates state-of-the-art BEV segmentation at both short ($0\sim50$m) and long-range ($50\sim200$m) intervals, with long-range gains up to approximately $20\%$ mIoU. The approach provides a practical path to enhanced planning and safety in autonomous driving by leveraging geospatial priors from OpenStreetMap to extend perceptual horizons, while acknowledging localization and map-alignment limitations as avenues for future work.
Abstract
Bird's-eye-view (BEV) representation is crucial for the perception function in autonomous driving tasks. It is difficult to balance the accuracy, efficiency and range of BEV representation. The existing works are restricted to a limited perception range within 50 meters. Extending the BEV representation range can greatly benefit downstream tasks such as topology reasoning, scene understanding, and planning by offering more comprehensive information and reaction time. The Standard-Definition (SD) navigation maps can provide a lightweight representation of road structure topology, characterized by ease of acquisition and low maintenance costs. An intuitive idea is to combine the close-range visual information from onboard cameras with the beyond line-of-sight (BLOS) environmental priors from SD maps to realize expanded perceptual capabilities. In this paper, we propose BLOS-BEV, a novel BEV segmentation model that incorporates SD maps for accurate beyond line-of-sight perception, up to 200m. Our approach is applicable to common BEV architectures and can achieve excellent results by incorporating information derived from SD maps. We explore various feature fusion schemes to effectively integrate the visual BEV representations and semantic features from the SD map, aiming to leverage the complementary information from both sources optimally. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in BEV segmentation on nuScenes and Argoverse benchmark. Through multi-modal inputs, BEV segmentation is significantly enhanced at close ranges below 50m, while also demonstrating superior performance in long-range scenarios, surpassing other methods by over 20% mIoU at distances ranging from 50-200m.
