Table of Contents
Fetching ...

A Structure-Aware Lane Graph Transformer Model for Vehicle Trajectory Prediction

Sun Zhanbo, Dong Caiyin, Ji Ang, Zhao Ruibin, Zhao Yu

TL;DR

The paper tackles autonomous-vehicle trajectory prediction by making Transformer attention map-structure aware through bias terms and topology encodings. It introduces four Relative Positional Encoding matrices and SPD matrices to embed lane connectivity and shortest-path information, complemented by local attention to focus on nearby lanes. The architecture—AgentNet, MapNet, FusionNet, and a multi-head prediction header—achieves substantial performance gains on Argoverse 2, notably reducing $\text{minFDE}_6$ and $\text{b-minFDE}_6$ compared to strong baselines. This structure-aware approach improves prediction accuracy and provides a pathway toward more reliable planning and control in real-world autonomous driving systems.

Abstract

Accurate prediction of future trajectories for surrounding vehicles is vital for the safe operation of autonomous vehicles. This study proposes a Lane Graph Transformer (LGT) model with structure-aware capabilities. Its key contribution lies in encoding the map topology structure into the attention mechanism. To address variations in lane information from different directions, four Relative Positional Encoding (RPE) matrices are introduced to capture the local details of the map topology structure. Additionally, two Shortest Path Distance (SPD) matrices are employed to capture distance information between two accessible lanes. Numerical results indicate that the proposed LGT model achieves a significantly higher prediction performance on the Argoverse 2 dataset. Specifically, the minFDE$_6$ metric was decreased by 60.73% compared to the Argoverse 2 baseline model (Nearest Neighbor) and the b-minFDE$_6$ metric was reduced by 2.65% compared to the baseline LaneGCN model. Furthermore, ablation experiments demonstrated that the consideration of map topology structure led to a 4.24% drop in the b-minFDE$_6$ metric, validating the effectiveness of this model.

A Structure-Aware Lane Graph Transformer Model for Vehicle Trajectory Prediction

TL;DR

The paper tackles autonomous-vehicle trajectory prediction by making Transformer attention map-structure aware through bias terms and topology encodings. It introduces four Relative Positional Encoding matrices and SPD matrices to embed lane connectivity and shortest-path information, complemented by local attention to focus on nearby lanes. The architecture—AgentNet, MapNet, FusionNet, and a multi-head prediction header—achieves substantial performance gains on Argoverse 2, notably reducing and compared to strong baselines. This structure-aware approach improves prediction accuracy and provides a pathway toward more reliable planning and control in real-world autonomous driving systems.

Abstract

Accurate prediction of future trajectories for surrounding vehicles is vital for the safe operation of autonomous vehicles. This study proposes a Lane Graph Transformer (LGT) model with structure-aware capabilities. Its key contribution lies in encoding the map topology structure into the attention mechanism. To address variations in lane information from different directions, four Relative Positional Encoding (RPE) matrices are introduced to capture the local details of the map topology structure. Additionally, two Shortest Path Distance (SPD) matrices are employed to capture distance information between two accessible lanes. Numerical results indicate that the proposed LGT model achieves a significantly higher prediction performance on the Argoverse 2 dataset. Specifically, the minFDE metric was decreased by 60.73% compared to the Argoverse 2 baseline model (Nearest Neighbor) and the b-minFDE metric was reduced by 2.65% compared to the baseline LaneGCN model. Furthermore, ablation experiments demonstrated that the consideration of map topology structure led to a 4.24% drop in the b-minFDE metric, validating the effectiveness of this model.
Paper Structure (21 sections, 17 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 17 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The critical issue addressed by the LGT model: compared to the standard Transformer, it integrates connectivity information between lanes, thus becoming structure-aware.
  • Figure 2: The overall framework of the proposed model, including AgentNet, MapNet, FusionNet, and Decoder.
  • Figure 3: An illustration of the LGT framework. It embeds map topology information into the attention matrix by introducing three bias matrices B, $D_{inter}$, $D_{outer}$ (highlighted in red font). Noteworthy that $D_{inter}$ and $D_{outer}$ exhibit a similar structure but with distinct weights.
  • Figure 4: The b-minFDE$_6$ metric with different number of neighbors under the A2A, A2L, and L2A interaction modes.
  • Figure 5: Qualitative results on the Argoverse 2 validation set: w/ embedding map topology matrices and local attention.
  • ...and 1 more figures