Learning Lane Graphs from Aerial Imagery Using Transformers
Martin Büchner, Simon Dorer, Abhinav Valada
TL;DR
This work addresses the challenge of generating high-fidelity lane topologies for autonomous navigation from aerial imagery by predicting successor lane graphs. It introduces the Aerial Lane Graph Transformer (ALGT), a DETR-based framework that predicts a set of path proposals representing maximal-length traversals of the lane graph and aggregates them into a final directed graph. Ablation studies show that polyline path representations outperform Bézier parametrizations, that PSPNet-based backbones yield better features, and that increased encoder/decoder capacity improves path predictions, with ALGT achieving superior path-level accuracy (APLS) while remaining competitive on topology metrics compared to the LaneGNN baseline. The approach reduces node-position errors common in onboard-sensor methods and holds promise for robust planning in complex urban environments.
Abstract
The robust and safe operation of automated vehicles underscores the critical need for detailed and accurate topological maps. At the heart of this requirement is the construction of lane graphs, which provide essential information on lane connectivity, vital for navigating complex urban environments autonomously. While transformer-based models have been effective in creating map topologies from vehicle-mounted sensor data, their potential for generating such graphs from aerial imagery remains untapped. This work introduces a novel approach to generating successor lane graphs from aerial imagery, utilizing the advanced capabilities of transformer models. We frame successor lane graphs as a collection of maximal length paths and predict them using a Detection Transformer (DETR) architecture. We demonstrate the efficacy of our method through extensive experiments on the diverse and large-scale UrbanLaneGraph dataset, illustrating its accuracy in generating successor lane graphs and highlighting its potential for enhancing autonomous vehicle navigation in complex environments.
