Table of Contents
Fetching ...

LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

Renyuan Peng, Xinyue Cai, Hang Xu, Jiachen Lu, Feng Wen, Wei Zhang, Li Zhang

TL;DR

LaneGraph2Seq reframes lane graph extraction as a sequence-to-sequence prediction problem by encoding both vertices and edges of a DAG $G=(V,E)$ into a token sequence. It combines a BEV encoder with a Transformer decoder to predict a vertex sequence and an edge sequence via a vertex-edge encoding and a DFS-based serialization, enabling end-to-end learning of road topology. Inference is augmented with classifier-free guidance and nucleus sampling to enhance connectivity and sampling diversity, reducing edge false negatives. Extensive experiments on nuScenes and Argoverse 2 demonstrate state-of-the-art performance across standard lane-graph metrics, with ablations confirming the value of DFS-based serialization, edge-partition sequencing, and deeper Transformer layers. The approach offers a scalable, data-efficient pathway for robust lane-graph extraction in autonomous driving, with potential gains from large-scale pretraining on HD map data.

Abstract

Understanding road structures is crucial for autonomous driving. Intricate road structures are often depicted using lane graphs, which include centerline curves and connections forming a Directed Acyclic Graph (DAG). Accurate extraction of lane graphs relies on precisely estimating vertex and edge information within the DAG. Recent research highlights Transformer-based language models' impressive sequence prediction abilities, making them effective for learning graph representations when graph data are encoded as sequences. However, existing studies focus mainly on modeling vertices explicitly, leaving edge information simply embedded in the network. Consequently, these approaches fall short in the task of lane graph extraction. To address this, we introduce LaneGraph2Seq, a novel approach for lane graph extraction. It leverages a language model with vertex-edge encoding and connectivity enhancement. Our serialization strategy includes a vertex-centric depth-first traversal and a concise edge-based partition sequence. Additionally, we use classifier-free guidance combined with nucleus sampling to improve lane connectivity. We validate our method on prominent datasets, nuScenes and Argoverse 2, showcasing consistent and compelling results. Our LaneGraph2Seq approach demonstrates superior performance compared to state-of-the-art techniques in lane graph extraction.

LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

TL;DR

LaneGraph2Seq reframes lane graph extraction as a sequence-to-sequence prediction problem by encoding both vertices and edges of a DAG into a token sequence. It combines a BEV encoder with a Transformer decoder to predict a vertex sequence and an edge sequence via a vertex-edge encoding and a DFS-based serialization, enabling end-to-end learning of road topology. Inference is augmented with classifier-free guidance and nucleus sampling to enhance connectivity and sampling diversity, reducing edge false negatives. Extensive experiments on nuScenes and Argoverse 2 demonstrate state-of-the-art performance across standard lane-graph metrics, with ablations confirming the value of DFS-based serialization, edge-partition sequencing, and deeper Transformer layers. The approach offers a scalable, data-efficient pathway for robust lane-graph extraction in autonomous driving, with potential gains from large-scale pretraining on HD map data.

Abstract

Understanding road structures is crucial for autonomous driving. Intricate road structures are often depicted using lane graphs, which include centerline curves and connections forming a Directed Acyclic Graph (DAG). Accurate extraction of lane graphs relies on precisely estimating vertex and edge information within the DAG. Recent research highlights Transformer-based language models' impressive sequence prediction abilities, making them effective for learning graph representations when graph data are encoded as sequences. However, existing studies focus mainly on modeling vertices explicitly, leaving edge information simply embedded in the network. Consequently, these approaches fall short in the task of lane graph extraction. To address this, we introduce LaneGraph2Seq, a novel approach for lane graph extraction. It leverages a language model with vertex-edge encoding and connectivity enhancement. Our serialization strategy includes a vertex-centric depth-first traversal and a concise edge-based partition sequence. Additionally, we use classifier-free guidance combined with nucleus sampling to improve lane connectivity. We validate our method on prominent datasets, nuScenes and Argoverse 2, showcasing consistent and compelling results. Our LaneGraph2Seq approach demonstrates superior performance compared to state-of-the-art techniques in lane graph extraction.
Paper Structure (27 sections, 4 equations, 7 figures, 3 tables)

This paper contains 27 sections, 4 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The extraction of a lane graph, which captures the centerline curves and their intricate connectivity relationships on the road, is crucial for the perception system of autonomous driving. Our proposed LaneGraph2Seq framework adeptly addresses this challenge by utilizing multi-camera images as input.
  • Figure 2: Our LaneGraph2Seq employs a BEV-encoder to transition features from the front view image to the bird's-eye view plane. Subsequently, a Transformer decoder generates tokens of the target sequence in sequence, guided by prior tokens and the encoded BEV feature.
  • Figure 3: This depiction outlines the procedure for constructing a sequence that represents the actual road structure. The upper part illustrates the abstraction of the real road into a Directed Acyclic Graph (DAG), while the middle section showcases the detailed process of encoding vertices and edges. The lower part exhibits the resulting vertex sequence and edge sequence after vertex-edge encoding, which is then combined to form the comprehensive lane graph sequence.
  • Figure 4: Examples of serialization order. Right provides possible example vertex sequences of different sorting methods. Start denotes the <start> token, while EOV signifies the <EOV> token. The differently colored x y entries indicate the x and y coordinate values of points corresponding to the colors on the left. As edge order aligns with point order, we exclusively present the vertex sequence.
  • Figure 5: Our qualitative results on nuScenes caesar2020nuscenes validation set. Evidently, our approach demonstrates an impressive ability to attain highly accurate predictions, with only a slight error in the red-boxed section.
  • ...and 2 more figures