Table of Contents
Fetching ...

Learning Lane Graph Representations for Motion Forecasting

Ming Liang, Bin Yang, Rui Hu, Yun Chen, Renjie Liao, Song Feng, Raquel Urtasun

TL;DR

This work addresses the challenge of motion forecasting in autonomous driving by replacing rasterized map inputs with a structured lane graph derived from vectorized HD-map data. It introduces LaneGCN, a graph-convolutional architecture with multi-type and dilated operations to capture lane topology, and couples it with ActorNet and FusionNet to model rich actor–map interactions. The approach enables explicit, topology-aware map representations and four interaction channels (A2L, L2L, L2A, A2A), achieving substantial improvements on the Argoverse benchmark. The results demonstrate the practical impact of using lane graphs and fusion-based interactions for accurate, multi-modal trajectory prediction in real-world driving scenarios.

Abstract

We propose a motion forecasting model that exploits a novel structured map representation as well as actor-map interactions. Instead of encoding vectorized maps as raster images, we construct a lane graph from raw map data to explicitly preserve the map structure. To capture the complex topology and long range dependencies of the lane graph, we propose LaneGCN which extends graph convolutions with multiple adjacency matrices and along-lane dilation. To capture the complex interactions between actors and maps, we exploit a fusion network consisting of four types of interactions, actor-to-lane, lane-to-lane, lane-to-actor and actor-to-actor. Powered by LaneGCN and actor-map interactions, our model is able to predict accurate and realistic multi-modal trajectories. Our approach significantly outperforms the state-of-the-art on the large scale Argoverse motion forecasting benchmark.

Learning Lane Graph Representations for Motion Forecasting

TL;DR

This work addresses the challenge of motion forecasting in autonomous driving by replacing rasterized map inputs with a structured lane graph derived from vectorized HD-map data. It introduces LaneGCN, a graph-convolutional architecture with multi-type and dilated operations to capture lane topology, and couples it with ActorNet and FusionNet to model rich actor–map interactions. The approach enables explicit, topology-aware map representations and four interaction channels (A2L, L2L, L2A, A2A), achieving substantial improvements on the Argoverse benchmark. The results demonstrate the practical impact of using lane graphs and fusion-based interactions for accurate, multi-modal trajectory prediction in real-world driving scenarios.

Abstract

We propose a motion forecasting model that exploits a novel structured map representation as well as actor-map interactions. Instead of encoding vectorized maps as raster images, we construct a lane graph from raw map data to explicitly preserve the map structure. To capture the complex topology and long range dependencies of the lane graph, we propose LaneGCN which extends graph convolutions with multiple adjacency matrices and along-lane dilation. To capture the complex interactions between actors and maps, we exploit a fusion network consisting of four types of interactions, actor-to-lane, lane-to-lane, lane-to-actor and actor-to-actor. Powered by LaneGCN and actor-map interactions, our model is able to predict accurate and realistic multi-modal trajectories. Our approach significantly outperforms the state-of-the-art on the large scale Argoverse motion forecasting benchmark.

Paper Structure

This paper contains 26 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Our approach: We construct a lane graph from raw map data and use LaneGCN to extract map features. In parallel, ActorNet extracts actor features from observed past trajectories. We then use FusionNet to model the Interactions between actors themselves and the map, and predict the future trajectories.
  • Figure 2: Overall architecture: Our model is composed of four modules. (1) ActorNet receives the past actor trajectories as input, and uses 1D convolution to extract actor node features. (2) MapNet constructs a lane graph from HD maps, and uses a LaneGCN to exact lane node features. (3) FusionNet is a stack of 4 interaction blocks. The actor to lane block fuses real-time traffic information from actor nodes to lane nodes. The lane to lane block propagates information over the lane graph and updates lane features. The lane to actor block fuses updated map information from lane nodes to actor nodes. The actor to actor block performs interactions among actors. We use another LaneGCN for the lane to lane block, and spatial attention layers for the other blocks. (4) The prediction header uses after-fusion actor features to produce multi-modal trajectories.
  • Figure 3: Lane graph construction from vectorized map data. Left: The lane centerline of interest, its predecessor, successor, left and right neighbor are denoted with red, orange, blue, purple, and green lines, respectively. Each centerline is given as a sequence of BEV points (hollow circles). Right: Derived lane graph with an example lane node. The lane node of interest, its predecessor, successor, left and right neighbor are denoted with red, orange, blue, purple and green circles respectively. See Section \ref{['sec:construct_graph']} for more information.
  • Figure 4: LaneGCN architecture. Our LaneGCN is a stack of 4 multi-scale LaneConv residual blocks, each of which consists of a LaneConv(1,2,4,8,16,32) and a linear layer with a residual connection residual. All layers have 128 feature channels.
  • Figure 5: Qualitative results on hard cases. From top to bottom, these hard cases involve missing the right turn mode, lacking history information, extreme deceleration and acceleration, respectively. See the text for more information.
  • ...and 1 more figures