Trajectory Prediction for Autonomous Driving Using a Transformer Network
Zhenning Li, Hao Yu
TL;DR
The paper tackles predicting future motions of surrounding agents for autonomous driving by fusing scene context with historical trajectories through a Context-Aware Transformer (CATF). It extends the model with a lighter CATF_l variant that uses linear attention for efficiency, and introduces a multimodal prediction framework with an off-road loss to enforce feasibility. Empirically, CATF and CATF_l achieve state-of-the-art performance on Lyft l5kit across multiple metrics, with CATF_l offering substantially faster inference and lower memory use. The work improves prediction plausibility and safety by aligning forecasts with drivable regions while maintaining high accuracy, enabling more reliable autonomous driving decisions.
Abstract
Predicting the trajectories of surrounding agents is still considered one of the most challenging tasks for autonomous driving. In this paper, we introduce a multi-modal trajectory prediction framework based on the transformer network. The semantic maps of each agent are used as inputs to convolutional networks to automatically derive relevant contextual information. A novel auxiliary loss that penalizes unfeasible off-road predictions is also proposed in this study. Experiments on the Lyft l5kit dataset show that the proposed model achieves state-of-the-art performance, substantially improving the accuracy and feasibility of the prediction outcomes.
