Table of Contents
Fetching ...

BezierFormer: A Unified Architecture for 2D and 3D Lane Detection

Zhiwei Dong, Xi Zhu, Xiya Cao, Ran Ding, Wei Li, Caifa Zhou, Yongliang Wang, Qiangbo Liu

TL;DR

BézierFormer introduces a unified 2D/3D lane detection architecture that represents lanes as cubic Bézier curves and uses dynamic Bézier control point queries. A novel Bézier curve attention mechanism samples multiple reference points along each curve to extract comprehensive lane features, and a Chamfer IoU-based loss aligns predicted curves with ground truth for robust regression. The approach achieves state-of-the-art results on CurveLanes (2D) and OpenLane (3D), validating the benefits of a unified Bézier representation and curve-focused attention for slender lane structures with efficient inference. The work demonstrates strong cross-modal performance and computational efficiency, highlighting the potential for further exploration of unified geometric representations in lane perception.

Abstract

Lane detection has made significant progress in recent years, but there is not a unified architecture for its two sub-tasks: 2D lane detection and 3D lane detection. To fill this gap, we introduce BézierFormer, a unified 2D and 3D lane detection architecture based on Bézier curve lane representation. BézierFormer formulate queries as Bézier control points and incorporate a novel Bézier curve attention mechanism. This attention mechanism enables comprehensive and accurate feature extraction for slender lane curves via sampling and fusing multiple reference points on each curve. In addition, we propose a novel Chamfer IoU-based loss which is more suitable for the Bézier control points regression. The state-of-the-art performance of BézierFormer on widely-used 2D and 3D lane detection benchmarks verifies its effectiveness and suggests the worthiness of further exploration.

BezierFormer: A Unified Architecture for 2D and 3D Lane Detection

TL;DR

BézierFormer introduces a unified 2D/3D lane detection architecture that represents lanes as cubic Bézier curves and uses dynamic Bézier control point queries. A novel Bézier curve attention mechanism samples multiple reference points along each curve to extract comprehensive lane features, and a Chamfer IoU-based loss aligns predicted curves with ground truth for robust regression. The approach achieves state-of-the-art results on CurveLanes (2D) and OpenLane (3D), validating the benefits of a unified Bézier representation and curve-focused attention for slender lane structures with efficient inference. The work demonstrates strong cross-modal performance and computational efficiency, highlighting the potential for further exploration of unified geometric representations in lane perception.

Abstract

Lane detection has made significant progress in recent years, but there is not a unified architecture for its two sub-tasks: 2D lane detection and 3D lane detection. To fill this gap, we introduce BézierFormer, a unified 2D and 3D lane detection architecture based on Bézier curve lane representation. BézierFormer formulate queries as Bézier control points and incorporate a novel Bézier curve attention mechanism. This attention mechanism enables comprehensive and accurate feature extraction for slender lane curves via sampling and fusing multiple reference points on each curve. In addition, we propose a novel Chamfer IoU-based loss which is more suitable for the Bézier control points regression. The state-of-the-art performance of BézierFormer on widely-used 2D and 3D lane detection benchmarks verifies its effectiveness and suggests the worthiness of further exploration.
Paper Structure (23 sections, 15 equations, 8 figures, 7 tables)

This paper contains 23 sections, 15 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Pipeline of BézierFormer. We draw 2D and 3D scenarios separately for clarity. (a) BézierFormer's decoder layer refine curve embeddings and 2D control points by extracting lane features according to input control points. (b) In 3D scenario, BézierFormer is equipped with perspective projection.
  • Figure 2: The architecture of BézierFormer. The feature extractor generates multi-scale image features $X$, and the Bézier curve decoder detects lanes from $X$. Decoder layers have the same structure. Each layer receives $X$, control point queries and curve embeddings from the previous layer. The first decoder layer's input $E_{0}$ are learnable, and $C_{0}$ are generated from $X$.
  • Figure 3: Attention visualization. (a) Ordinary attention. (b) Deformable attention. (c) Bézier curve attention.
  • Figure 4: Illustration of different curve regression losses.
  • Figure 5: The architecture of three different decoders.
  • ...and 3 more figures