DEMO: A Dynamics-Enhanced Learning Model for Multi-Horizon Trajectory Prediction in Autonomous Vehicles
Chengyue Wang, Haicheng Liao, Kaiqun Zhu, Guohui Zhang, Zhenning Li
TL;DR
DEMO tackles autonomous-vehicle trajectory prediction across short-term and long-term horizons by integrating physics-based dynamics with learning-based interaction modeling. It introduces a Dynamics Learning Stage that fuses a Dynamic Bicycle Model with a Dynamic Conditional Variational Autoencoder (DynCVAE) to capture immediate motion, and an Interaction Learning Stage that uses a Temporal Encoder, cross-modal fusion with HD-map data, and a Spatial-temporal Encoder to model social and environmental interactions. A Multi-modal Decoder then generates multiple trajectory hypotheses and maneuver probabilities, supervised by losses including $\mathcal{L}_{KL}$, $\mathcal{L}_{DI}$, and dataset-specific accuracy terms. Across NGSIM, HighD, MoCAD, and nuScenes, DEMO achieves state-of-the-art accuracy for both horizons and exhibits fast inference, indicating strong practical potential for real-time AV systems.
Abstract
Autonomous vehicles (AVs) rely on accurate trajectory prediction of surrounding vehicles to ensure the safety of both passengers and other road users. Trajectory prediction spans both short-term and long-term horizons, each requiring distinct considerations: short-term predictions rely on accurately capturing the vehicle's dynamics, while long-term predictions rely on accurately modeling the interaction patterns within the environment. However current approaches, either physics-based or learning-based models, always ignore these distinct considerations, making them struggle to find the optimal prediction for both short-term and long-term horizon. In this paper, we introduce the Dynamics-Enhanced Learning MOdel (DEMO), a novel approach that combines a physics-based Vehicle Dynamics Model with advanced deep learning algorithms. DEMO employs a two-stage architecture, featuring a Dynamics Learning Stage and an Interaction Learning Stage, where the former stage focuses on capturing vehicle motion dynamics and the latter focuses on modeling interaction. By capitalizing on the respective strengths of both methods, DEMO facilitates multi-horizon predictions for future trajectories. Experimental results on the Next Generation Simulation (NGSIM), Macau Connected Autonomous Driving (MoCAD), Highway Drone (HighD), and nuScenes datasets demonstrate that DEMO outperforms state-of-the-art (SOTA) baselines in both short-term and long-term prediction horizons.
