Trajectory Prediction for Autonomous Driving Using a Transformer Network

Zhenning Li; Hao Yu

Trajectory Prediction for Autonomous Driving Using a Transformer Network

Zhenning Li, Hao Yu

TL;DR

The paper tackles predicting future motions of surrounding agents for autonomous driving by fusing scene context with historical trajectories through a Context-Aware Transformer (CATF). It extends the model with a lighter CATF_l variant that uses linear attention for efficiency, and introduces a multimodal prediction framework with an off-road loss to enforce feasibility. Empirically, CATF and CATF_l achieve state-of-the-art performance on Lyft l5kit across multiple metrics, with CATF_l offering substantially faster inference and lower memory use. The work improves prediction plausibility and safety by aligning forecasts with drivable regions while maintaining high accuracy, enabling more reliable autonomous driving decisions.

Abstract

Predicting the trajectories of surrounding agents is still considered one of the most challenging tasks for autonomous driving. In this paper, we introduce a multi-modal trajectory prediction framework based on the transformer network. The semantic maps of each agent are used as inputs to convolutional networks to automatically derive relevant contextual information. A novel auxiliary loss that penalizes unfeasible off-road predictions is also proposed in this study. Experiments on the Lyft l5kit dataset show that the proposed model achieves state-of-the-art performance, substantially improving the accuracy and feasibility of the prediction outcomes.

Trajectory Prediction for Autonomous Driving Using a Transformer Network

TL;DR

Abstract

Paper Structure (22 sections, 12 equations, 4 figures, 1 table)

This paper contains 22 sections, 12 equations, 4 figures, 1 table.

Introduction
Problem Formulation
Context-Aware Transformer Model
Encoder-Decoder Framework
A. Input and Output
B. Positional Encoding
C. Multi-head Attention
D. FFN and Add & Norm Layers
Loss Functions
Experimental Analysis and Evaluations
Dataset
Baselines
Constant Velocity and Yaw
Multiple-Trajectory Prediction (MTP)
Trajectron
...and 7 more sections

Figures (4)

Figure 1: Structure of Transformer Network
Figure 2: Frameworks of Scaled Dot-Product Attention, Multi-Head Attention, and Multi-Head Linear Attention
Figure 3: Examples of different BEV scene rasterization maps including AV (green rectangle) and TVs (blue rectangle).
Figure 4: Model comparison in going straight (upper three) and turning scenes (lower three). Green rectangle represents the target agent, blue rectangle represents other agent, darker afterimage indicate the history trajectory ($K=3$, $h=1s$ and $H=5s$)

Trajectory Prediction for Autonomous Driving Using a Transformer Network

TL;DR

Abstract

Trajectory Prediction for Autonomous Driving Using a Transformer Network

Authors

TL;DR

Abstract

Table of Contents

Figures (4)