CDKFormer: Contextual Deviation Knowledge-Based Transformer for Long-Tail Trajectory Prediction
Yuansheng Lian, Ke Zhang, Meng Li
TL;DR
<3-5 sentence high-level summary> CDKFormer tackles the rare and challenging long-tail trajectory prediction problem for autonomous vehicles by introducing contextual deviation features and a dual query-based Transformer decoder. It jointly encodes scene context and deviation status, then decodes with mode and dual future queries through a multistream decoder to generate robust multimodal trajectories. The method achieves state-of-the-art results on Argoverse 2 and inD, with strong tail performance demonstrated via CVaR analysis and comprehensive ablations. It also emphasizes the need for future work on map-aware deviation modeling and causal analysis of tail failures to further improve safety and reliability in real-world traffic.
Abstract
Predicting the future movements of surrounding vehicles is essential for ensuring the safe operation and efficient navigation of autonomous vehicles (AVs) in urban traffic environments. Existing vehicle trajectory prediction methods primarily focus on improving overall performance, yet they struggle to address long-tail scenarios effectively. This limitation often leads to poor predictions in rare cases, significantly increasing the risk of safety incidents. Taking Argoverse 2 motion forecasting dataset as an example, we first investigate the long-tail characteristics in trajectory samples from two perspectives, individual motion and group interaction, and deriving deviation features to distinguish abnormal from regular scenarios. On this basis, we propose CDKFormer, a Contextual Deviation Knowledge-based Transformer model for long-tail trajectory prediction. CDKFormer integrates an attention-based scene context fusion module to encode spatiotemporal interaction and road topology. An additional deviation feature fusion module is proposed to capture the dynamic deviations in the target vehicle status. We further introduce a dual query-based decoder, supported by a multi-stream decoder block, to sequentially decode heterogeneous scene deviation features and generate multimodal trajectory predictions. Extensive experiments demonstrate that CDKFormer achieves state-of-the-art performance, significantly enhancing prediction accuracy and robustness for long-tailed trajectories compared to existing methods, thus advancing the reliability of AVs in complex real-world environments.
