Table of Contents
Fetching ...

Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors

Han Li, Zehao Huang, Zitian Wang, Wenge Rong, Naiyan Wang, Si Liu

TL;DR

This work addresses the challenge of accurate 3D lane detection and reliable topology reasoning in driving scenes. It introduces Topo2D, a Transformer-based framework that leverages 2D lane priors to initialize 3D lane queries and 3D positional embeddings, and explicitly fuses 2D lane features into topology predictions between lane centerlines and traffic elements. The approach achieves state-of-the-art performance on OpenLane-V2 multi-view topology reasoning (OLS 44.5%) and OpenLane single-view 3D lane detection (F-Score 62.6%), significantly outperforming prior methods by incorporating 2D priors throughout detection and reasoning. By bridging 2D lane priors with 3D perception, Topo2D improves recall, topology accuracy, and robustness across challenging scenarios, advancing online HD map construction and autonomous driving perception.

Abstract

3D lane detection and topology reasoning are essential tasks in autonomous driving scenarios, requiring not only detecting the accurate 3D coordinates on lane lines, but also reasoning the relationship between lanes and traffic elements. Current vision-based methods, whether explicitly constructing BEV features or not, all establish the lane anchors/queries in 3D space while ignoring the 2D lane priors. In this study, we propose Topo2D, a novel framework based on Transformer, leveraging 2D lane instances to initialize 3D queries and 3D positional embeddings. Furthermore, we explicitly incorporate 2D lane features into the recognition of topology relationships among lane centerlines and between lane centerlines and traffic elements. Topo2D achieves 44.5% OLS on multi-view topology reasoning benchmark OpenLane-V2 and 62.6% F-Socre on single-view 3D lane detection benchmark OpenLane, exceeding the performance of existing state-of-the-art methods.

Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors

TL;DR

This work addresses the challenge of accurate 3D lane detection and reliable topology reasoning in driving scenes. It introduces Topo2D, a Transformer-based framework that leverages 2D lane priors to initialize 3D lane queries and 3D positional embeddings, and explicitly fuses 2D lane features into topology predictions between lane centerlines and traffic elements. The approach achieves state-of-the-art performance on OpenLane-V2 multi-view topology reasoning (OLS 44.5%) and OpenLane single-view 3D lane detection (F-Score 62.6%), significantly outperforming prior methods by incorporating 2D priors throughout detection and reasoning. By bridging 2D lane priors with 3D perception, Topo2D improves recall, topology accuracy, and robustness across challenging scenarios, advancing online HD map construction and autonomous driving perception.

Abstract

3D lane detection and topology reasoning are essential tasks in autonomous driving scenarios, requiring not only detecting the accurate 3D coordinates on lane lines, but also reasoning the relationship between lanes and traffic elements. Current vision-based methods, whether explicitly constructing BEV features or not, all establish the lane anchors/queries in 3D space while ignoring the 2D lane priors. In this study, we propose Topo2D, a novel framework based on Transformer, leveraging 2D lane instances to initialize 3D queries and 3D positional embeddings. Furthermore, we explicitly incorporate 2D lane features into the recognition of topology relationships among lane centerlines and between lane centerlines and traffic elements. Topo2D achieves 44.5% OLS on multi-view topology reasoning benchmark OpenLane-V2 and 62.6% F-Socre on single-view 3D lane detection benchmark OpenLane, exceeding the performance of existing state-of-the-art methods.
Paper Structure (33 sections, 14 equations, 9 figures, 6 tables)

This paper contains 33 sections, 14 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: (a) Previous methods randomly initialize 3D lane queries in 3D space. (b) Our method initializes 3D lane queries given 2D lane priors. (c) Comparison of lane detection recall under different thresholds. In both 2D recall and 3D recall, our model shows marked advancement relative to baseline MapTRMaptr across various thresholds.
  • Figure 2: The overall architecture of Topo2D. Given multi-view images, the images are first input to backbone network and FPN to extract image features. The image features are then fed into the subsequent 2D lane detector and 2D traffic element detector. The 3D lane detector initializes 3D lane queries and 3D position embeddings based on 2D lane priors and outputs 3D lane detection results. Finally, the 2D lane features are fused with the 3D lane features, and their relationships are estimated based on fused lane features and traffic element features.
  • Figure 3: Illustration of topology prediction heads. First, we embed the 2D lane instance queries using MLPs, and add them with the embedded 3D lane instance queries. Then we concatenate the lane queries with each other in pairs, as well as the lane queries with the traffic element queries in pairs, to predict the topology relationships. Additionally, for lane-lane topology, we incorporate embeddings of the 3D coordinates of lane points, while for lane-traffic element topology, we incorporate embeddings of camera parameters.
  • Figure 4: Visualization of 2D and 3D lane detection results on OpenLane-V2 subset_A. (a) is an intersection scene and our method accurately detects the positions of all centerlines in this scene. (b) is a failure case where our method predicts centerlines that align well with ground truths except for less precise starting/ending points. Ground truths are showed in red, while predictions are showed in green. Best viewed in color.
  • Figure 5: Visualization of topology reasoning results on OpenLane-V2 subset_A. Left: Lane-traffic element topology. Right: Lane-lane topology. Ground truths are showed in red, while predictions are showed in green. Best viewed in color.
  • ...and 4 more figures