Table of Contents
Fetching ...

Lane Graph Extraction from Aerial Imagery via Lane Segmentation Refinement with Diffusion Models

Antonio Ruiz, Andrew Melnik, Nicolo Savioli, Dong Wang, Yanfeng Zhang, Helge Ritter

TL;DR

This work addresses lane-graph extraction from aerial imagery by integrating a CNN-based lane segmentation stage with a conditional diffusion-model refinement stage, followed by a standard segmentation-to-graph conversion. The diffusion refinement is initialized from the CNN output and executed with a deterministic DDIM sampler, yielding sharper, more complete lane masks that improve graph connectivity. Quantitative results show gains over CNN-only and diffusion-ensemble baselines in GEO and TOPO metrics, particularly in connectivity, with ablation studies validating the contribution of the conditioning strategies and sampling steps. The approach offers a scalable, flexible pipeline for accurate lane graphs over large areas, with potential extensions to intersections and directed graphs, and suggests diffusion-based refinement as a practical enhancement for aerial-lane topologies.

Abstract

The lane graph is critical for applications such as autonomous driving and lane-level route planning. While previous research has focused on extracting lane-level graphs from aerial imagery using convolutional neural networks (CNNs) followed by post-processing segmentation-to-graph algorithms, these methods often face challenges in producing sharp and complete segmentation masks. Challenges such as occlusions, variations in lighting, and changes in road texture can lead to incomplete and inaccurate lane masks, resulting in poor-quality lane graphs. To address these challenges, we propose a novel approach that refines the lane masks, output by a CNN, using diffusion models. Experimental results on a publicly available dataset demonstrate that our method outperforms existing methods based solely on CNNs or diffusion models, particularly in terms of graph connectivity. Our lane mask refinement approach enhances the quality of the extracted lane graph, yielding gains of approximately 1.5\% in GEO F1 and 3.5\% in TOPO F1 scores over the best-performing CNN-based method, and improvements of 28\% and 34\%, respectively, compared to a prior diffusion-based approach. Both GEO F1 and TOPO F1 scores are critical metrics for evaluating lane graph quality. Additionally, ablation studies are conducted to evaluate the individual components of our approach, providing insights into their respective contributions and effectiveness.

Lane Graph Extraction from Aerial Imagery via Lane Segmentation Refinement with Diffusion Models

TL;DR

This work addresses lane-graph extraction from aerial imagery by integrating a CNN-based lane segmentation stage with a conditional diffusion-model refinement stage, followed by a standard segmentation-to-graph conversion. The diffusion refinement is initialized from the CNN output and executed with a deterministic DDIM sampler, yielding sharper, more complete lane masks that improve graph connectivity. Quantitative results show gains over CNN-only and diffusion-ensemble baselines in GEO and TOPO metrics, particularly in connectivity, with ablation studies validating the contribution of the conditioning strategies and sampling steps. The approach offers a scalable, flexible pipeline for accurate lane graphs over large areas, with potential extensions to intersections and directed graphs, and suggests diffusion-based refinement as a practical enhancement for aerial-lane topologies.

Abstract

The lane graph is critical for applications such as autonomous driving and lane-level route planning. While previous research has focused on extracting lane-level graphs from aerial imagery using convolutional neural networks (CNNs) followed by post-processing segmentation-to-graph algorithms, these methods often face challenges in producing sharp and complete segmentation masks. Challenges such as occlusions, variations in lighting, and changes in road texture can lead to incomplete and inaccurate lane masks, resulting in poor-quality lane graphs. To address these challenges, we propose a novel approach that refines the lane masks, output by a CNN, using diffusion models. Experimental results on a publicly available dataset demonstrate that our method outperforms existing methods based solely on CNNs or diffusion models, particularly in terms of graph connectivity. Our lane mask refinement approach enhances the quality of the extracted lane graph, yielding gains of approximately 1.5\% in GEO F1 and 3.5\% in TOPO F1 scores over the best-performing CNN-based method, and improvements of 28\% and 34\%, respectively, compared to a prior diffusion-based approach. Both GEO F1 and TOPO F1 scores are critical metrics for evaluating lane graph quality. Additionally, ablation studies are conducted to evaluate the individual components of our approach, providing insights into their respective contributions and effectiveness.
Paper Structure (21 sections, 8 equations, 7 figures, 7 tables)

This paper contains 21 sections, 8 equations, 7 figures, 7 tables.

Figures (7)

  • Figure S1: Challenging scenarios for CNNs. The first column shows aerial image patches containing challenging scenarios highlighted by red dotted boxes. The second column illustrates the segmentation masks produced by LaneSegmentation he2022lane (a CNN-based model) , while the third column displays the masks predicted by our model. The first two rows highlight regions affected by occlusion, caused by queues of cars in the first row and by trees in the second row. The third row depicts a case involving a change in road texture, and the fourth illustrates the impact of lighting variation. In all these scenarios, CNNs struggle to accurately segment the lanes, whereas our model consistently produces sharp and complete lane segmentation masks.
  • Figure S2: Visual results for a region of testing tile A, comparing the outputs of the following methods: (1) LaneExtraction he2022lane (top row), (2) an ensemble of diffusion models wu2022medsegdiff (middle row), and (3) our method (bottom row). The first column shows the input aerial RGB image (top), ground truth segmentation mask (middle), and ground truth lane graph (bottom); the second and third columns display predicted lane segmentation masks and their corresponding lane graphs (used for computing the metrics). In the lane graphs (third column), green nodes indicate matched nodes, blue nodes represent false positives, and red nodes false negatives. Nodes appear as short line segments due to close spacing. Our method exhibits improved topological continuity and sharper lane segments compared to baselines.
  • Figure S3: Visual results for a region of testing tile B. Same arrangement as for tile A.
  • Figure S4: Overall pipeline of our method. During inference, the aerial RGB patch is first fed into the D-LinkNet zhou2018d, which outputs an unrefined segmentation mask and an unrefined direction map. Gaussian noise is then added to the unrefined segmentation mask to create the starting latent variable $x_T$ (starting point instead of Gaussian noise) for the DDIM song2020denoising sampling. After several sampling steps, a refined segmentation mask $x_0$ is generated. In Stage 2, the blue dotted arrows indicate conditioning, while the solid black arrow represents the input variables for the diffusion model. Finally, the refined segmentation mask is passed to the segmentation-to-graph algorithm to produce the final lane graph.
  • Figure S5: Conditional DDIM song2020denoising sampling process. Gaussian noise is added to the unrefined segmentation mask (from Stage 1) to generate the initial latent variable $x_T$. Then, several DDIM sampling steps, as described in Equation \ref{['eq:x-t-delta-equation']}, are applied to progressively refine the segmentation mask, resulting in the final output $x_0$.
  • ...and 2 more figures