Table of Contents
Fetching ...

TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior

Sen Yang, Minyue Jiang, Ziwei Fan, Xiaolu Xie, Xiao Tan, Yingying Li, Errui Ding, Liang Wang, Jingdong Wang

TL;DR

This work proposes to train the perception model to "see" standard definition maps (SDMaps), and incorporates such complementary features as prior information to improve the bird's eye view (BEV) feature for lane geometry and topology decoding.

Abstract

Recent advances in autonomous driving systems have shifted towards reducing reliance on high-definition maps (HDMaps) due to the huge costs of annotation and maintenance. Instead, researchers are focusing on online vectorized HDMap construction using on-board sensors. However, sensor-only approaches still face challenges in long-range perception due to the restricted views imposed by the mounting angles of onboard cameras, just as human drivers also rely on bird's-eye-view navigation maps for a comprehensive understanding of road structures. To address these issues, we propose to train the perception model to "see" standard definition maps (SDMaps). We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information to improve the bird's eye view (BEV) feature for lane geometry and topology decoding. Based on the lane segment representation framework, the model simultaneously predicts lanes, centrelines and their topology. To further enhance the ability of geometry prediction and topology reasoning, we also use a topology-guided decoder to refine the predictions by exploiting the mutual relationships between topological and geometric features. We perform extensive experiments on OpenLane-V2 datasets to validate the proposed method. The results show that our model outperforms state-of-the-art methods by a large margin, with gains of +6.7 and +9.1 on the mAP and topology metrics. Our analysis also reveals that models trained with SDMap noise augmentation exhibit enhanced robustness.

TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior

TL;DR

This work proposes to train the perception model to "see" standard definition maps (SDMaps), and incorporates such complementary features as prior information to improve the bird's eye view (BEV) feature for lane geometry and topology decoding.

Abstract

Recent advances in autonomous driving systems have shifted towards reducing reliance on high-definition maps (HDMaps) due to the huge costs of annotation and maintenance. Instead, researchers are focusing on online vectorized HDMap construction using on-board sensors. However, sensor-only approaches still face challenges in long-range perception due to the restricted views imposed by the mounting angles of onboard cameras, just as human drivers also rely on bird's-eye-view navigation maps for a comprehensive understanding of road structures. To address these issues, we propose to train the perception model to "see" standard definition maps (SDMaps). We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information to improve the bird's eye view (BEV) feature for lane geometry and topology decoding. Based on the lane segment representation framework, the model simultaneously predicts lanes, centrelines and their topology. To further enhance the ability of geometry prediction and topology reasoning, we also use a topology-guided decoder to refine the predictions by exploiting the mutual relationships between topological and geometric features. We perform extensive experiments on OpenLane-V2 datasets to validate the proposed method. The results show that our model outperforms state-of-the-art methods by a large margin, with gains of +6.7 and +9.1 on the mAP and topology metrics. Our analysis also reveals that models trained with SDMap noise augmentation exhibit enhanced robustness.

Paper Structure

This paper contains 18 sections, 4 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Comparison between the previous lane segment perception pipeline and ours. We incorporate SDMap information as prior to enhance the geometry and topology learning.
  • Figure 2: The overall model architecture. The model receives perspective images from cameras arranged in a surrounding view configuration and a locally aligned SDMap as inputs. The images are processed by the image backbone to obtain multi-scale image features. The polylines of SDMap are encoded as two representations -- a 2D-shaped SD feature map and a set of vectorized SD tokens. We adopt a BEVFormer-like encoder to extract BEV features. The SD feature map is added to the BEV queries and BEV features. The SD tokens interact with BEV queries via cross-attention. Then we use a Topology-Guided Decoder to predict the lane segment results. SA denotes the Self-Attention layer.
  • Figure 3: The model performances under different levels of SDMap noise. Each curve represents the same model trained under some condition of adding SDMap noise.
  • Figure 4: Visualization results on some cases. We compare our model with the ground truth, the prediction results of LaneSegNet. * means the predicted lane segments with the input SDMap polylines. The blue, black and green lines represent roads, side walks and cross walks in SDMap.
  • Figure 5: Visualization results on some cases that the given SDMaps has some inconsistency with the lane annotations. For each example, we show 4 sub-figures: GT lane segments, GT lane segments with SDMap, predicted lane segments and predicted lane segments with SDMap.
  • ...and 2 more figures