Lane Graph Extraction from Aerial Imagery via Lane Segmentation Refinement with Diffusion Models
Antonio Ruiz, Andrew Melnik, Nicolo Savioli, Dong Wang, Yanfeng Zhang, Helge Ritter
TL;DR
This work addresses lane-graph extraction from aerial imagery by integrating a CNN-based lane segmentation stage with a conditional diffusion-model refinement stage, followed by a standard segmentation-to-graph conversion. The diffusion refinement is initialized from the CNN output and executed with a deterministic DDIM sampler, yielding sharper, more complete lane masks that improve graph connectivity. Quantitative results show gains over CNN-only and diffusion-ensemble baselines in GEO and TOPO metrics, particularly in connectivity, with ablation studies validating the contribution of the conditioning strategies and sampling steps. The approach offers a scalable, flexible pipeline for accurate lane graphs over large areas, with potential extensions to intersections and directed graphs, and suggests diffusion-based refinement as a practical enhancement for aerial-lane topologies.
Abstract
The lane graph is critical for applications such as autonomous driving and lane-level route planning. While previous research has focused on extracting lane-level graphs from aerial imagery using convolutional neural networks (CNNs) followed by post-processing segmentation-to-graph algorithms, these methods often face challenges in producing sharp and complete segmentation masks. Challenges such as occlusions, variations in lighting, and changes in road texture can lead to incomplete and inaccurate lane masks, resulting in poor-quality lane graphs. To address these challenges, we propose a novel approach that refines the lane masks, output by a CNN, using diffusion models. Experimental results on a publicly available dataset demonstrate that our method outperforms existing methods based solely on CNNs or diffusion models, particularly in terms of graph connectivity. Our lane mask refinement approach enhances the quality of the extracted lane graph, yielding gains of approximately 1.5\% in GEO F1 and 3.5\% in TOPO F1 scores over the best-performing CNN-based method, and improvements of 28\% and 34\%, respectively, compared to a prior diffusion-based approach. Both GEO F1 and TOPO F1 scores are critical metrics for evaluating lane graph quality. Additionally, ablation studies are conducted to evaluate the individual components of our approach, providing insights into their respective contributions and effectiveness.
