Table of Contents
Fetching ...

Sparse Laneformer

Ji Liu, Zifeng Zhang, Mingjie Lu, Hongyang Wei, Dong Li, Yile Xie, Jinzhang Peng, Lu Tian, Ashish Sirasao, Emad Barsoum

TL;DR

Sparse Laneformer addresses lane-detection efficiency and generalization by replacing dense anchors with sparse, image-adaptive anchors defined by position-aware lane queries and angle queries. It introduces a two-stage transformer decoder with Horizontal Perceptual Attention (HPA), Lane-Angle Cross Attention (LACA), and Lane Perceptual Attention (LPA) to robustly model lane geometry and refine predictions. The method uses dynamic rotation of anchors around a rotation point and learns offsets to form final lanes via $\mathcal{P} = \mathcal{A} + \mathcal{O}$, trained end-to-end with Hungarian matching and a composite loss including Line IoU. Experiments on CULane, TuSimple, and LLAMAS show competitive or superior F1 scores with fewer MACs compared to state-of-the-art anchor-based methods, demonstrating effective efficiency-accuracy trade-offs. The work suggests a promising direction for sparse, transformer-based lane detection and hints at future extensions to 3D lane estimation.

Abstract

Lane detection is a fundamental task in autonomous driving, and has achieved great progress as deep learning emerges. Previous anchor-based methods often design dense anchors, which highly depend on the training dataset and remain fixed during inference. We analyze that dense anchors are not necessary for lane detection, and propose a transformer-based lane detection framework based on a sparse anchor mechanism. To this end, we generate sparse anchors with position-aware lane queries and angle queries instead of traditional explicit anchors. We adopt Horizontal Perceptual Attention (HPA) to aggregate the lane features along the horizontal direction, and adopt Lane-Angle Cross Attention (LACA) to perform interactions between lane queries and angle queries. We also propose Lane Perceptual Attention (LPA) based on deformable cross attention to further refine the lane predictions. Our method, named Sparse Laneformer, is easy-to-implement and end-to-end trainable. Extensive experiments demonstrate that Sparse Laneformer performs favorably against the state-of-the-art methods, e.g., surpassing Laneformer by 3.0% F1 score and O2SFormer by 0.7% F1 score with fewer MACs on CULane with the same ResNet-34 backbone.

Sparse Laneformer

TL;DR

Sparse Laneformer addresses lane-detection efficiency and generalization by replacing dense anchors with sparse, image-adaptive anchors defined by position-aware lane queries and angle queries. It introduces a two-stage transformer decoder with Horizontal Perceptual Attention (HPA), Lane-Angle Cross Attention (LACA), and Lane Perceptual Attention (LPA) to robustly model lane geometry and refine predictions. The method uses dynamic rotation of anchors around a rotation point and learns offsets to form final lanes via $\mathcal{P} = \mathcal{A} + \mathcal{O}$, trained end-to-end with Hungarian matching and a composite loss including Line IoU. Experiments on CULane, TuSimple, and LLAMAS show competitive or superior F1 scores with fewer MACs compared to state-of-the-art anchor-based methods, demonstrating effective efficiency-accuracy trade-offs. The work suggests a promising direction for sparse, transformer-based lane detection and hints at future extensions to 3D lane estimation.

Abstract

Lane detection is a fundamental task in autonomous driving, and has achieved great progress as deep learning emerges. Previous anchor-based methods often design dense anchors, which highly depend on the training dataset and remain fixed during inference. We analyze that dense anchors are not necessary for lane detection, and propose a transformer-based lane detection framework based on a sparse anchor mechanism. To this end, we generate sparse anchors with position-aware lane queries and angle queries instead of traditional explicit anchors. We adopt Horizontal Perceptual Attention (HPA) to aggregate the lane features along the horizontal direction, and adopt Lane-Angle Cross Attention (LACA) to perform interactions between lane queries and angle queries. We also propose Lane Perceptual Attention (LPA) based on deformable cross attention to further refine the lane predictions. Our method, named Sparse Laneformer, is easy-to-implement and end-to-end trainable. Extensive experiments demonstrate that Sparse Laneformer performs favorably against the state-of-the-art methods, e.g., surpassing Laneformer by 3.0% F1 score and O2SFormer by 0.7% F1 score with fewer MACs on CULane with the same ResNet-34 backbone.
Paper Structure (15 sections, 7 equations, 5 figures, 5 tables)

This paper contains 15 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison with the anchor setting of state-of-the-art methods. Our method has sparse anchors. In (d), base vertical lane angles (marked with green dash lines) rotate with the predicted angles around defined rotation points to generate dynamic anchors (marked with yellow lines) for every input image.
  • Figure 2: Dynamic anchor generation. The initialized vertical anchors are rotated around the rotation point by $\theta$, which is predicted by angle queries.
  • Figure 3: Overview of the proposed Sparse Laneformer. "Fusion" represents Eq. \ref{['eq:comb']}. See text for details.
  • Figure 4: Sketch map of Horizontal Perceptual Attention (HPA). Each element in lane queries will only focus on features in one horizontal region of the image.
  • Figure 5: Sketch map of Lane Perceptual Attention (LPA). Lane queries only interact with the reference points according to the lane prediction results from the first-stage decoder, which boosts the network to focus on the local contextual details of lane features.