HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

Xiaolong Tang; Meina Kan; Shiguang Shan; Zhilong Ji; Jinfeng Bai; Xilin Chen

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

Xiaolong Tang, Meina Kan, Shiguang Shan, Zhilong Ji, Jinfeng Bai, Xilin Chen

TL;DR

HPNet addresses instability in trajectory forecasting by treating prediction as a dynamic task and introducing Historical Prediction Attention, which leverages historical predictions to inform current forecasts. It combines Spatio-Temporal Context Encoding with Triple Factorized Attention (Agent Attention, Historical Prediction Attention, and Mode Attention) to model agent interactions, historical prediction correlations, and multimodal mode relationships, followed by a multimodal output head that refines proposals. On Argoverse and INTERACTION, HPNet achieves state-of-the-art results and benefits from ablations showing the critical role of each attention component, especially the historical prediction mechanism, in improving accuracy and stability. The approach enables longer effective attention ranges without additional computation, supporting more reliable and timely decision-making in autonomous driving systems. The work provides practical gains for real-world planning by delivering more stable and accurate future trajectories while maintaining feasible inference latency.

Abstract

Predicting the trajectories of road agents is essential for autonomous driving systems. The recent mainstream methods follow a static paradigm, which predicts the future trajectory by using a fixed duration of historical frames. These methods make the predictions independently even at adjacent time steps, which leads to potential instability and temporal inconsistency. As successive time steps have largely overlapping historical frames, their forecasting should have intrinsic correlation, such as overlapping predicted trajectories should be consistent, or be different but share the same motion goal depending on the road situation. Motivated by this, in this work, we introduce HPNet, a novel dynamic trajectory forecasting method. Aiming for stable and accurate trajectory forecasting, our method leverages not only historical frames including maps and agent states, but also historical predictions. Specifically, we newly design a Historical Prediction Attention module to automatically encode the dynamic relationship between successive predictions. Besides, it also extends the attention range beyond the currently visible window benefitting from the use of historical predictions. The proposed Historical Prediction Attention together with the Agent Attention and Mode Attention is further formulated as the Triple Factorized Attention module, serving as the core design of HPNet.Experiments on the Argoverse and INTERACTION datasets show that HPNet achieves state-of-the-art performance, and generates accurate and stable future trajectories. Our code are available at https://github.com/XiaolongTang23/HPNet.

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

TL;DR

Abstract

Paper Structure (18 sections, 20 equations, 6 figures, 3 tables)

This paper contains 18 sections, 20 equations, 6 figures, 3 tables.

Introduction
Related Work
Method
Spatio-Temporal Context Encoding
Agent Attention
Historical Prediction Attention
Mode Attention
Multimodal Output
Training Objective
Experiments
Experimental Setup
Comparison with State-of-the-art
Ablation Study
Conclusion
Training Objective for Joint Prediction
...and 3 more sections

Figures (6)

Figure 1: The difference between previous methods and ours. Previous methods (upper) treat trajectory prediction as a static task, predicting future trajectories based on a fixed-length sequence of historical frames. They independently forecast trajectories even at adjacent timesteps, despite the considerable overlap in input data. In contrast, HPNet (lower) views trajectory prediction as a dynamic task. It not only leverages historical frames but also historical prediction embeddings to forecast trajectories.
Figure 2: An overview of HPNet. The proposed HPNet encompasses three components: Spatio-Temporal Context Encoding, Triple Factorized Attention, and Multimodal Output. Firstly, it combines agent and lane features with mode queries to create initial prediction embeddings. Subsequently, Triple Factorized Attention — comprising Agent Attention, our proposed Historical Prediction Attention, and Mode Attention — refine these prediction embeddings. Finally, the prediction embeddings are decoded by an MLP to obtain the predicted trajectories. The predicted trajectories are fed into this pipeline again to enhance the precision of predictions.
Figure 3: Comparison of prediction Accuracy (b-minFDE$\downarrow$) and Stability (summed ADE$\downarrow$) of our HPNet and its baseline without Historical Prediction Attention on the Argoverse validation set.
Figure 4: Qualitative results on the Argoverse validation set. Baseline (a) alternately forecasts one motion goal (i.e., turn left) and two motion goals (i.e., turn left and go straight). In contrast, HPNet (b) consistently and reliably predicts the same motion goal (i.e., turn left). The lanes, historical trajectory, ground truth trajectory, and six predicted trajectories are indicated in grey, green, red, and blue, respectively.
Figure 5: Predictions of HPNet (lower) and baseline (upper).
...and 1 more figures

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

TL;DR

Abstract

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

Authors

TL;DR

Abstract

Table of Contents

Figures (6)