Table of Contents
Fetching ...

3D Lane Detection from Front or Surround-View using Joint-Modeling & Matching

Haibin Zhou, Huabing Zhou, Jun Chang, Tao Lu, Jiayi Ma

TL;DR

This work tackles 3D lane detection from front or surround-view imagery by proposing Joint Lane Modeling, which combines Bézier curves with interpolation to represent lanes, coupled with a Global2Local Lane Matching (GL-BK) mechanism and a 3D Spatial Encoder to predict 3D lane key points. The network architecture integrates a ResNet-50 backbone, voxel-like 3D sampling via Deformable Attention, and a Map Decoder that yields both key-point and Bézier-control-point features for robust matching. Empirical results show state-of-the-art performance on Openlane 3D front-view lanes and competitive results on Argoverse2 3D surround-view lanes, including the first demonstration of 3D surround-view lane detection. The approach provides a practical, adaptable framework for accurate lane modeling in complex road geometries, with strong implications for perception systems in autonomous driving.

Abstract

3D lanes offer a more comprehensive understanding of the road surface geometry than 2D lanes, thereby providing crucial references for driving decisions and trajectory planning. While many efforts aim to improve prediction accuracy, we recognize that an efficient network can bring results closer to lane modeling. However, if the modeling data is imprecise, the results might not accurately capture the real-world scenario. Therefore, accurate lane modeling is essential to align prediction results closely with the environment. This study centers on efficient and accurate lane modeling, proposing a joint modeling approach that combines Bezier curves and interpolation methods. Furthermore, based on this lane modeling approach, we developed a Global2Local Lane Matching method with Bezier Control-Point and Key-Point, which serve as a comprehensive solution that leverages hierarchical features with two mathematical models to ensure a precise match. We also introduce a novel 3D Spatial Encoder, representing an exploration of 3D surround-view lane detection research. The framework is suitable for front-view or surround-view 3D lane detection. By directly outputting the key points of lanes in 3D space, it overcomes the limitations of anchor-based methods, enabling accurate prediction of closed-loop or U-shaped lanes and effective adaptation to complex road conditions. This innovative method establishes a new benchmark in front-view 3D lane detection on the Openlane dataset and achieves competitive performance in surround-view 2D lane detection on the Argoverse2 dataset.

3D Lane Detection from Front or Surround-View using Joint-Modeling & Matching

TL;DR

This work tackles 3D lane detection from front or surround-view imagery by proposing Joint Lane Modeling, which combines Bézier curves with interpolation to represent lanes, coupled with a Global2Local Lane Matching (GL-BK) mechanism and a 3D Spatial Encoder to predict 3D lane key points. The network architecture integrates a ResNet-50 backbone, voxel-like 3D sampling via Deformable Attention, and a Map Decoder that yields both key-point and Bézier-control-point features for robust matching. Empirical results show state-of-the-art performance on Openlane 3D front-view lanes and competitive results on Argoverse2 3D surround-view lanes, including the first demonstration of 3D surround-view lane detection. The approach provides a practical, adaptable framework for accurate lane modeling in complex road geometries, with strong implications for perception systems in autonomous driving.

Abstract

3D lanes offer a more comprehensive understanding of the road surface geometry than 2D lanes, thereby providing crucial references for driving decisions and trajectory planning. While many efforts aim to improve prediction accuracy, we recognize that an efficient network can bring results closer to lane modeling. However, if the modeling data is imprecise, the results might not accurately capture the real-world scenario. Therefore, accurate lane modeling is essential to align prediction results closely with the environment. This study centers on efficient and accurate lane modeling, proposing a joint modeling approach that combines Bezier curves and interpolation methods. Furthermore, based on this lane modeling approach, we developed a Global2Local Lane Matching method with Bezier Control-Point and Key-Point, which serve as a comprehensive solution that leverages hierarchical features with two mathematical models to ensure a precise match. We also introduce a novel 3D Spatial Encoder, representing an exploration of 3D surround-view lane detection research. The framework is suitable for front-view or surround-view 3D lane detection. By directly outputting the key points of lanes in 3D space, it overcomes the limitations of anchor-based methods, enabling accurate prediction of closed-loop or U-shaped lanes and effective adaptation to complex road conditions. This innovative method establishes a new benchmark in front-view 3D lane detection on the Openlane dataset and achieves competitive performance in surround-view 2D lane detection on the Argoverse2 dataset.
Paper Structure (21 sections, 22 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 21 sections, 22 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of the impact of three modeling methods on lanes in front-view. Polynomial Curve: This method involves two polynomial functions, one describing the X-Y relationship and another for the Z-Y relationship. Interpolation Curve: This approach is based on interpolation and represented by a fixed set of key points. Bézier Curve with Control Points: This method yields smooth and accurate curves by manipulating control points. (A set of points sharing the same color governs the same lane line.)
  • Figure 2: Model overview. As illustrated, the network processes surround-view images and produces the 3D coordinates of key points for each lane, along with their respective categories. The 3D Spatial Encoder utilizes a voxel-like query to elevate the 2D image features extracted by the Backbone to 3D space. The Transformer of Map Decoder output key points feature ${\bm Q}_k$ and control points feature ${\bm Q}_c$, and feeds them to three matching branches. These three matching branches conduct Global2Local Lane Matching with Bézier Control-Point and Key-Point. For detailed information about ${\bm Q}_k$ and ${\bm Q}_c$, please refer to the Map Decoder under the 3D Lane Detection Network section. (The network architecture for front-view image is consistent with that of surround-view.)
  • Figure 3: The process of the query to generate 3D features. For a voxel-like query, the 3D reference point $(x, y, z)$ is projected to the 2D point $(u, v)$ with the function $\mathcal{P}$. Based on the Deformable Attention (DeformAttn), sampling is performed around the target point $(u, v)$ to obtain a weighted sum of 2D features. We project the 2D features back into a corresponding voxel in the 3D grid. After traversing the entire 3D grids, comprehensive 3D features are generated.
  • Figure 4: A comparison of the lanes modeling capabilities of the three modeling methods reveals the Polynomial Curve's suboptimal performance. In contrast, both the Interpolation Curve and Bézier Curve spotlight distinct advantages.
  • Figure 5: Visualization comparison of 3D front-view lane results. It illustrates predictions versus annotated data (Pred&Ann), predictions versus modeling ground truth (Pred&GT), and annotated data versus modeling ground truth (Ann&GT) from both 3D and 2D perspectives. The Pred& Ann comparison highlights the network's predictive capability, while the Pred&GT comparison emphasizes precision. The Ann&GT comparison evaluates how well the network's modeling mirrors real-world conditions. Lastly, projecting the prediction results into image space further validates their effectiveness. Our network surpasses the Performer in terms of line position accuracy and the rendition of complex lines.
  • ...and 4 more figures