PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling
Xinshuo Weng, Ye Yuan, Kris Kitani
TL;DR
The paper introduces PTP, a parallelized framework that jointly optimizes 3D multi-object tracking and trajectory forecasting to learn a shared representation of agent interaction. It leverages Graph Neural Networks for socially aware feature interaction and a diversity sampling mechanism based on Determinantal Point Processes to produce diverse, high-quality future trajectories via a CVAE. Empirically, PTP achieves state-of-the-art results on KITTI and nuScenes for both 3D MOT and trajectory forecasting, and ablations confirm the benefits of parallelization, GNNs, and DSF. The approach improves robustness and efficiency, offering a unified solution for perception systems in autonomous driving. The work also provides detailed architectural components and training losses that can inform future joint-tracking and prediction research.
Abstract
Multi-object tracking (MOT) and trajectory prediction are two critical components in modern 3D perception systems that require accurate modeling of multi-agent interaction. We hypothesize that it is beneficial to unify both tasks under one framework in order to learn a shared feature representation of agent interaction. Furthermore, instead of performing tracking and prediction sequentially which can propagate errors from tracking to prediction, we propose a parallelized framework to mitigate the issue. Also, our parallel track-forecast framework incorporates two additional novel computational units. First, we use a feature interaction technique by introducing Graph Neural Networks (GNNs) to capture the way in which agents interact with one another. The GNN is able to improve discriminative feature learning for MOT association and provide socially-aware contexts for trajectory prediction. Second, we use a diversity sampling function to improve the quality and diversity of our forecasted trajectories. The learned sampling function is trained to efficiently extract a variety of outcomes from a generative trajectory distribution and helps avoid the problem of generating duplicate trajectory samples. We evaluate on KITTI and nuScenes datasets showing that our method with socially-aware feature learning and diversity sampling achieves new state-of-the-art performance on 3D MOT and trajectory prediction. Project website is: https://www.xinshuoweng.com/projects/PTP
