Sparse Laneformer

Ji Liu; Zifeng Zhang; Mingjie Lu; Hongyang Wei; Dong Li; Yile Xie; Jinzhang Peng; Lu Tian; Ashish Sirasao; Emad Barsoum

Sparse Laneformer

Ji Liu, Zifeng Zhang, Mingjie Lu, Hongyang Wei, Dong Li, Yile Xie, Jinzhang Peng, Lu Tian, Ashish Sirasao, Emad Barsoum

TL;DR

Sparse Laneformer addresses lane-detection efficiency and generalization by replacing dense anchors with sparse, image-adaptive anchors defined by position-aware lane queries and angle queries. It introduces a two-stage transformer decoder with Horizontal Perceptual Attention (HPA), Lane-Angle Cross Attention (LACA), and Lane Perceptual Attention (LPA) to robustly model lane geometry and refine predictions. The method uses dynamic rotation of anchors around a rotation point and learns offsets to form final lanes via $\mathcal{P} = \mathcal{A} + \mathcal{O}$, trained end-to-end with Hungarian matching and a composite loss including Line IoU. Experiments on CULane, TuSimple, and LLAMAS show competitive or superior F1 scores with fewer MACs compared to state-of-the-art anchor-based methods, demonstrating effective efficiency-accuracy trade-offs. The work suggests a promising direction for sparse, transformer-based lane detection and hints at future extensions to 3D lane estimation.

Abstract

Lane detection is a fundamental task in autonomous driving, and has achieved great progress as deep learning emerges. Previous anchor-based methods often design dense anchors, which highly depend on the training dataset and remain fixed during inference. We analyze that dense anchors are not necessary for lane detection, and propose a transformer-based lane detection framework based on a sparse anchor mechanism. To this end, we generate sparse anchors with position-aware lane queries and angle queries instead of traditional explicit anchors. We adopt Horizontal Perceptual Attention (HPA) to aggregate the lane features along the horizontal direction, and adopt Lane-Angle Cross Attention (LACA) to perform interactions between lane queries and angle queries. We also propose Lane Perceptual Attention (LPA) based on deformable cross attention to further refine the lane predictions. Our method, named Sparse Laneformer, is easy-to-implement and end-to-end trainable. Extensive experiments demonstrate that Sparse Laneformer performs favorably against the state-of-the-art methods, e.g., surpassing Laneformer by 3.0% F1 score and O2SFormer by 0.7% F1 score with fewer MACs on CULane with the same ResNet-34 backbone.

Sparse Laneformer

TL;DR

Abstract

Paper Structure (15 sections, 7 equations, 5 figures, 5 tables)

This paper contains 15 sections, 7 equations, 5 figures, 5 tables.

INTRODUCTION
RELATED WORK
Lane Detection
Generic Object Detection with Sparse Anchors
METHOD
Sparse Anchor Design
Transformer Decoder Design
First-Stage Decoder
Second-Stage Decoder
End-to-End Training
EXPERIMENTS
Experimental Setting
Comparison with State-of-the-Art Methods
Ablation study
CONCLUSIONS

Figures (5)

Figure 1: Comparison with the anchor setting of state-of-the-art methods. Our method has sparse anchors. In (d), base vertical lane angles (marked with green dash lines) rotate with the predicted angles around defined rotation points to generate dynamic anchors (marked with yellow lines) for every input image.
Figure 2: Dynamic anchor generation. The initialized vertical anchors are rotated around the rotation point by $\theta$, which is predicted by angle queries.
Figure 3: Overview of the proposed Sparse Laneformer. "Fusion" represents Eq. \ref{['eq:comb']}. See text for details.
Figure 4: Sketch map of Horizontal Perceptual Attention (HPA). Each element in lane queries will only focus on features in one horizontal region of the image.
Figure 5: Sketch map of Lane Perceptual Attention (LPA). Lane queries only interact with the reference points according to the lane prediction results from the first-stage decoder, which boosts the network to focus on the local contextual details of lane features.

Sparse Laneformer

TL;DR

Abstract

Sparse Laneformer

Authors

TL;DR

Abstract

Table of Contents

Figures (5)