Table of Contents
Fetching ...

Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection

Yifan Chang, Junjie Huang, Xiaofeng Wang, Yun Ye, Zhujin Liang, Yi Shan, Dalong Du, Xingang Wang

TL;DR

This work identifies fundamental flaws in sparse-point monocular 3D lane detection, showing that endpoint truncation in training ground truth can introduce substantial errors. It introduces an endpoint patching strategy and an EndPoint head (EP-head) to predict patching distances, enabling more complete lane representations with fewer preset points. To further leverage lane geometry, the authors propose PointLane attention (PL-attention), which integrates within-lane, cross-lane, and same-y interactions as priors in the attention mechanism. Across multiple state-of-the-art baselines on OpenLane, EP-head and PL-attention yield consistent improvements in F1-score (e.g., +4.4 on Persformer, +3.2 on Anchor3DLane, +2.8 on LATR), demonstrating enhanced robustness in complex scenarios and potential applicability to 2D lane detection and HD map construction.

Abstract

Monocular 3D lane detection is a fundamental task in autonomous driving. Although sparse-point methods lower computational load and maintain high accuracy in complex lane geometries, current methods fail to fully leverage the geometric structure of lanes in both lane geometry representations and model design. In lane geometry representations, we present a theoretical analysis alongside experimental validation to verify that current sparse lane representation methods contain inherent flaws, resulting in potential errors of up to 20 m, which raise significant safety concerns for driving. To address this issue, we propose a novel patching strategy to completely represent the full lane structure. To enable existing models to match this strategy, we introduce the EndPoint head (EP-head), which adds a patching distance to endpoints. The EP-head enables the model to predict more complete lane representations even with fewer preset points, effectively addressing existing limitations and paving the way for models that are faster and require fewer parameters in the future. In model design, to enhance the model's perception of lane structures, we propose the PointLane attention (PL-attention), which incorporates prior geometric knowledge into the attention mechanism. Extensive experiments demonstrate the effectiveness of the proposed methods on various state-of-the-art models. For instance, in terms of the overall F1-score, our methods improve Persformer by 4.4 points, Anchor3DLane by 3.2 points, and LATR by 2.8 points. The code will be available soon.

Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection

TL;DR

This work identifies fundamental flaws in sparse-point monocular 3D lane detection, showing that endpoint truncation in training ground truth can introduce substantial errors. It introduces an endpoint patching strategy and an EndPoint head (EP-head) to predict patching distances, enabling more complete lane representations with fewer preset points. To further leverage lane geometry, the authors propose PointLane attention (PL-attention), which integrates within-lane, cross-lane, and same-y interactions as priors in the attention mechanism. Across multiple state-of-the-art baselines on OpenLane, EP-head and PL-attention yield consistent improvements in F1-score (e.g., +4.4 on Persformer, +3.2 on Anchor3DLane, +2.8 on LATR), demonstrating enhanced robustness in complex scenarios and potential applicability to 2D lane detection and HD map construction.

Abstract

Monocular 3D lane detection is a fundamental task in autonomous driving. Although sparse-point methods lower computational load and maintain high accuracy in complex lane geometries, current methods fail to fully leverage the geometric structure of lanes in both lane geometry representations and model design. In lane geometry representations, we present a theoretical analysis alongside experimental validation to verify that current sparse lane representation methods contain inherent flaws, resulting in potential errors of up to 20 m, which raise significant safety concerns for driving. To address this issue, we propose a novel patching strategy to completely represent the full lane structure. To enable existing models to match this strategy, we introduce the EndPoint head (EP-head), which adds a patching distance to endpoints. The EP-head enables the model to predict more complete lane representations even with fewer preset points, effectively addressing existing limitations and paving the way for models that are faster and require fewer parameters in the future. In model design, to enhance the model's perception of lane structures, we propose the PointLane attention (PL-attention), which incorporates prior geometric knowledge into the attention mechanism. Extensive experiments demonstrate the effectiveness of the proposed methods on various state-of-the-art models. For instance, in terms of the overall F1-score, our methods improve Persformer by 4.4 points, Anchor3DLane by 3.2 points, and LATR by 2.8 points. The code will be available soon.

Paper Structure

This paper contains 34 sections, 14 equations, 10 figures, 11 tables, 3 algorithms.

Figures (10)

  • Figure 1: (a) Our method patches both endpoints and uses the EP-head to predict the patching distance, bringing the training ground truth closer to the original ground truth. (b) The short mode truncates both ends of the original ground truth, while (c) the long mode extends them. However, both methods produce inaccurate training ground truth and fail to fully capture the original ground truth.
  • Figure 2: Lane length distribution comparisons between ApolloSim and OpenLane.
  • Figure 3: Overview flow of PL-attention with Point-point attention (Point-Attn), lane-lane attention (Lane-Attn) and point-y attention (PointY-Attn).
  • Figure 4: Curve-based Representations. The black lane represents the training ground truth, while the yellow lane represents the training prediction. Even if the shape of the prediction is accurate, any deviation in the endpoint position can cause the entire lane to shift significantly.
  • Figure 5: (1) The current sparse lane representation approach truncates the original ground truth, leading to an incomplete representation in the training ground truth. (2) In contrast, our method patches both endpoints and employs the EP-head to predict the patching distances, allowing the training prediction to more closely align with the original ground truth.
  • ...and 5 more figures