Table of Contents
Fetching ...

LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation

Zijie Wang, Weiming Zhang, Wei Zhang, Xiao Tan, Hongxing Liu, Yaowei Wang, Guanbin Li

TL;DR

LaneDiffusion tackles the challenge of centerline graph learning under occlusion and ambiguity by introducing a diffusion-based paradigm that generates lane priors at the BEV feature level. The method relies on two modular components: LPIM, which injects high-precision lane priors into BEV features to create diffusion targets, and LPDM, which models these prior-injected BEV features with a conditioned diffusion process and refines the result before decoding into vectorized centerlines and topology. With staged optimization, the approach achieves state-of-the-art performance on nuScenes and Argoverse2 across both fine-grained point-level and segment-level metrics, demonstrating the effectiveness of diffusion for probabilistic lane reasoning. The work offers a flexible, integrable add-on for BEV-based architectures and highlights the potential of generative models to handle centerline topology in autonomous driving, with future work aimed at real-time efficiency and lighter-weight deployment.

Abstract

Centerline graphs, crucial for path planning in autonomous driving, are traditionally learned using deterministic methods. However, these methods often lack spatial reasoning and struggle with occluded or invisible centerlines. Generative approaches, despite their potential, remain underexplored in this domain. We introduce LaneDiffusion, a novel generative paradigm for centerline graph learning. LaneDiffusion innovatively employs diffusion models to generate lane centerline priors at the Bird's Eye View (BEV) feature level, instead of directly predicting vectorized centerlines. Our method integrates a Lane Prior Injection Module (LPIM) and a Lane Prior Diffusion Module (LPDM) to effectively construct diffusion targets and manage the diffusion process. Furthermore, vectorized centerlines and topologies are then decoded from these prior-injected BEV features. Extensive evaluations on the nuScenes and Argoverse2 datasets demonstrate that LaneDiffusion significantly outperforms existing methods, achieving improvements of 4.2%, 4.6%, 4.7%, 6.4% and 1.8% on fine-grained point-level metrics (GEO F1, TOPO F1, JTOPO F1, APLS and SDA) and 2.3%, 6.4%, 6.8% and 2.1% on segment-level metrics (IoU, mAP_cf, DET_l and TOP_ll). These results establish state-of-the-art performance in centerline graph learning, offering new insights into generative models for this task.

LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation

TL;DR

LaneDiffusion tackles the challenge of centerline graph learning under occlusion and ambiguity by introducing a diffusion-based paradigm that generates lane priors at the BEV feature level. The method relies on two modular components: LPIM, which injects high-precision lane priors into BEV features to create diffusion targets, and LPDM, which models these prior-injected BEV features with a conditioned diffusion process and refines the result before decoding into vectorized centerlines and topology. With staged optimization, the approach achieves state-of-the-art performance on nuScenes and Argoverse2 across both fine-grained point-level and segment-level metrics, demonstrating the effectiveness of diffusion for probabilistic lane reasoning. The work offers a flexible, integrable add-on for BEV-based architectures and highlights the potential of generative models to handle centerline topology in autonomous driving, with future work aimed at real-time efficiency and lighter-weight deployment.

Abstract

Centerline graphs, crucial for path planning in autonomous driving, are traditionally learned using deterministic methods. However, these methods often lack spatial reasoning and struggle with occluded or invisible centerlines. Generative approaches, despite their potential, remain underexplored in this domain. We introduce LaneDiffusion, a novel generative paradigm for centerline graph learning. LaneDiffusion innovatively employs diffusion models to generate lane centerline priors at the Bird's Eye View (BEV) feature level, instead of directly predicting vectorized centerlines. Our method integrates a Lane Prior Injection Module (LPIM) and a Lane Prior Diffusion Module (LPDM) to effectively construct diffusion targets and manage the diffusion process. Furthermore, vectorized centerlines and topologies are then decoded from these prior-injected BEV features. Extensive evaluations on the nuScenes and Argoverse2 datasets demonstrate that LaneDiffusion significantly outperforms existing methods, achieving improvements of 4.2%, 4.6%, 4.7%, 6.4% and 1.8% on fine-grained point-level metrics (GEO F1, TOPO F1, JTOPO F1, APLS and SDA) and 2.3%, 6.4%, 6.8% and 2.1% on segment-level metrics (IoU, mAP_cf, DET_l and TOP_ll). These results establish state-of-the-art performance in centerline graph learning, offering new insights into generative models for this task.

Paper Structure

This paper contains 28 sections, 13 equations, 3 figures, 5 tables, 2 algorithms.

Figures (3)

  • Figure 1: The Motivation. This illustration presents two challenging cases where the state-of-the-art deterministic approach, CGNet, encounters difficulties in handling occlusions or ambiguous visual cues. Our generative framework LaneDiffusion offers a supplementary solution that specifically mitigates these challenges through probabilistic modeling.
  • Figure 2: The overall framework of LaneDiffusion. LaneDiffusion comprises two key components: (i) a Lane Prior Injection Module (LPIM), which injects lane priors into BEV features to construct the diffusion target, and (ii) a Lane Prior Diffusion Module (LPDM), which models the prior-injected BEV feature via a diffusion process conditioned on the original feature. The modular design of LaneDiffusion makes it a flexible add-on for BEV feature-based architectures.
  • Figure 3: Qualitative comparisons under different weather and lighting conditions on nuScenes.