LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement
Renyuan Peng, Xinyue Cai, Hang Xu, Jiachen Lu, Feng Wen, Wei Zhang, Li Zhang
TL;DR
LaneGraph2Seq reframes lane graph extraction as a sequence-to-sequence prediction problem by encoding both vertices and edges of a DAG $G=(V,E)$ into a token sequence. It combines a BEV encoder with a Transformer decoder to predict a vertex sequence and an edge sequence via a vertex-edge encoding and a DFS-based serialization, enabling end-to-end learning of road topology. Inference is augmented with classifier-free guidance and nucleus sampling to enhance connectivity and sampling diversity, reducing edge false negatives. Extensive experiments on nuScenes and Argoverse 2 demonstrate state-of-the-art performance across standard lane-graph metrics, with ablations confirming the value of DFS-based serialization, edge-partition sequencing, and deeper Transformer layers. The approach offers a scalable, data-efficient pathway for robust lane-graph extraction in autonomous driving, with potential gains from large-scale pretraining on HD map data.
Abstract
Understanding road structures is crucial for autonomous driving. Intricate road structures are often depicted using lane graphs, which include centerline curves and connections forming a Directed Acyclic Graph (DAG). Accurate extraction of lane graphs relies on precisely estimating vertex and edge information within the DAG. Recent research highlights Transformer-based language models' impressive sequence prediction abilities, making them effective for learning graph representations when graph data are encoded as sequences. However, existing studies focus mainly on modeling vertices explicitly, leaving edge information simply embedded in the network. Consequently, these approaches fall short in the task of lane graph extraction. To address this, we introduce LaneGraph2Seq, a novel approach for lane graph extraction. It leverages a language model with vertex-edge encoding and connectivity enhancement. Our serialization strategy includes a vertex-centric depth-first traversal and a concise edge-based partition sequence. Additionally, we use classifier-free guidance combined with nucleus sampling to improve lane connectivity. We validate our method on prominent datasets, nuScenes and Argoverse 2, showcasing consistent and compelling results. Our LaneGraph2Seq approach demonstrates superior performance compared to state-of-the-art techniques in lane graph extraction.
