Post-interactive Multimodal Trajectory Prediction for Autonomous Driving
Ziyi Huang, Yang Li, Dushuai Li, Yao Mu, Hongmao Qin, Nan Zheng
TL;DR
This work tackles the uncertainty in autonomous driving trajectory prediction by emphasizing post-interaction features, which have been underexplored. It introduces Pioformer, a coarse-to-fine Transformer framework consisting of a Coarse Trajectory Network (CTN), a Trajectory Proposal Network (TPN) based on a Hyper-Interactor (HGNN), and a Proposal Refinement Network (PRN) that iteratively refines trajectory proposals using post-interaction cues. A three-stage training scheme progressively trains CTN, TPN, and PRN to stabilize learning and leverage high-order interactions, achieving strong accuracy with a compact model on Argoverse 1 and generalizing to Argoverse 2. The approach also demonstrates practical gains for motion planning, yielding safer and more reliable ego-vehicle plans in strongly interactive scenarios. Overall, Pioformer advances multimodal trajectory prediction by explicitly modeling high-order post-interactions and integrating refinement stages with planning considerations, all while maintaining a favorable model size-to-accuracy balance.
Abstract
Modeling the interactions among agents for trajectory prediction of autonomous driving has been challenging due to the inherent uncertainty in agents' behavior. The interactions involved in the predicted trajectories of agents, also called post-interactions, have rarely been considered in trajectory prediction models. To this end, we propose a coarse-to-fine Transformer for multimodal trajectory prediction, i.e., Pioformer, which explicitly extracts the post-interaction features to enhance the prediction accuracy. Specifically, we first build a Coarse Trajectory Network to generate coarse trajectories based on the observed trajectories and lane segments, in which the low-order interaction features are extracted with the graph neural networks. Next, we build a hypergraph neural network-based Trajectory Proposal Network to generate trajectory proposals, where the high-order interaction features are learned by the hypergraphs. Finally, the trajectory proposals are sent to the Proposal Refinement Network for further refinement. The observed trajectories and trajectory proposals are concatenated together as the inputs of the Proposal Refinement Network, in which the post-interaction features are learned by combining the previous interaction features and trajectory consistency features. Moreover, we propose a three-stage training scheme to facilitate the learning process. Extensive experiments on the Argoverse 1 dataset demonstrate the superiority of our method. Compared with the baseline HiVT-64, our model has reduced the prediction errors by 4.4%, 8.4%, 14.4%, 5.7% regarding metrics minADE6, minFDE6, MR6, and brier-minFDE6, respectively.
