Table of Contents
Fetching ...

SAIL: Scene-aware Adaptive Iterative Learning for Long-Tail Trajectory Prediction in Autonomous Vehicles

Bin Rao, Haicheng Liao, Chengyue Wang, Keqiang Li, Zhenning Li, Hai Yang

Abstract

Autonomous vehicles (AVs) rely on accurate trajectory prediction for safe navigation in diverse traffic environments, yet existing models struggle with long-tail scenarios-rare but safety-critical events characterized by abrupt maneuvers, high collision risks, and complex interactions. These challenges stem from data imbalance, inadequate definitions of long-tail trajectories, and suboptimal learning strategies that prioritize common behaviors over infrequent ones. To address this, we propose SAIL, a novel framework that systematically tackles the long-tail problem by first defining and modeling trajectories across three key attribute dimensions: prediction error, collision risk, and state complexity. Our approach then synergizes an attribute-guided augmentation and feature extraction process with a highly adaptive contrastive learning strategy. This strategy employs a continuous cosine momentum schedule, similarity-weighted hard-negative mining, and a dynamic pseudo-labeling mechanism based on evolving feature clustering. Furthermore, it incorporates a focusing mechanism to intensify learning on hard-positive samples within each identified class. This comprehensive design enables SAIL to excel at identifying and forecasting diverse and challenging long-tail events. Extensive evaluations on the nuScenes and ETH/UCY datasets demonstrate SAIL's superior performance, achieving up to 28.8% reduction in prediction error on the hardest 1% of long-tail samples compared to state-of-the-art baselines, while maintaining competitive accuracy across all scenarios. This framework advances reliable AV trajectory prediction in real-world, mixed-autonomy settings.

SAIL: Scene-aware Adaptive Iterative Learning for Long-Tail Trajectory Prediction in Autonomous Vehicles

Abstract

Autonomous vehicles (AVs) rely on accurate trajectory prediction for safe navigation in diverse traffic environments, yet existing models struggle with long-tail scenarios-rare but safety-critical events characterized by abrupt maneuvers, high collision risks, and complex interactions. These challenges stem from data imbalance, inadequate definitions of long-tail trajectories, and suboptimal learning strategies that prioritize common behaviors over infrequent ones. To address this, we propose SAIL, a novel framework that systematically tackles the long-tail problem by first defining and modeling trajectories across three key attribute dimensions: prediction error, collision risk, and state complexity. Our approach then synergizes an attribute-guided augmentation and feature extraction process with a highly adaptive contrastive learning strategy. This strategy employs a continuous cosine momentum schedule, similarity-weighted hard-negative mining, and a dynamic pseudo-labeling mechanism based on evolving feature clustering. Furthermore, it incorporates a focusing mechanism to intensify learning on hard-positive samples within each identified class. This comprehensive design enables SAIL to excel at identifying and forecasting diverse and challenging long-tail events. Extensive evaluations on the nuScenes and ETH/UCY datasets demonstrate SAIL's superior performance, achieving up to 28.8% reduction in prediction error on the hardest 1% of long-tail samples compared to state-of-the-art baselines, while maintaining competitive accuracy across all scenarios. This framework advances reliable AV trajectory prediction in real-world, mixed-autonomy settings.

Paper Structure

This paper contains 40 sections, 28 equations, 11 figures, 11 tables, 2 algorithms.

Figures (11)

  • Figure 1: Analyzing vehicle trajectories from the perspectives of Prediction Error, Risk (inverse time-to-collision, 1/TTC), and Vehicle State reveals their intrinsic long-tail nature. The top 5% of data in each distribution corresponds to distinct real-world scenarios, which are often critical to ensuring the safe operation of autonomous vehicles.
  • Figure 2: The overall architecture of our proposed SAIL framework. The framework takes historical trajectories and HD map data as input and processes them through a multi-stage pipeline, including the Scene Representation Learning module and the Attribute-aware Trajectory Generator, to output multiple future trajectories. Panels (b), (c), and (d) provide detailed views of our key components: the Multi-dimensional Long-Tail Attributes definition, the Attribute Disentanglement and Prediction module, and the Attribute-aware Trajectory Generator, respectively.
  • Figure 3: Visualization of our Attribute-Guided Trajectory Augmentation strategies. Based on the identified long-tail attributes of a trajectory, AGTA applies a combination of targeted augmentations (Simplify, Shift, Mask, Subset) to create a diverse set of challenging positive samples for the subsequent contrastive learning stage.
  • Figure 4: Heatmaps illustrating the performance improvements of the SAIL model relative to Q-EANet on the nuScenes dataset, categorized by collision risk levels and prediction horizons. Negative values indicate superior performance by SAIL. (a) Differences in minADE. (b) Differences in minFDE.
  • Figure 5: UpSet visualization of intersections among attribute-specific long-tail subsets on the nuScenes validation set. Subfigures (a) to (d) correspond to the Top 20%, Top 15%, Top 10%, and Top 5% tail thresholds, respectively. In each subfigure, the bars indicate the number of samples in each subset intersection, while the connected dots denote the corresponding combination of Prediction Error, Collision Risk, and State Complexity. The percentages above the bars represent the proportion of each intersection among all samples selected under the given threshold.
  • ...and 6 more figures