Table of Contents
Fetching ...

Diffusion-Based Environment-Aware Trajectory Prediction

Theodor Westny, Björn Olofsson, Erik Frisk

TL;DR

This work tackles the challenge of predicting future trajectories for multiple road users under multimodal uncertainty by introducing a diffusion-based conditional model that jointly represents inter-agent interactions and environment knowledge via Graph Attention Networks and Graph-GRU. It integrates lane-graph map information and enforces physical feasibility through differential motion constraints, while enabling interaction-aware guidance to modulate samples. The approach achieves state-of-the-art accuracy on real-world highD and roundabout datasets and demonstrates the model's capacity for multimodal predictions with few diffusion steps. The results suggest practical applicability for robust motion planning in autonomous driving, including scenarios with less cooperative agents and varying interaction strength.

Abstract

The ability to predict the future trajectories of traffic participants is crucial for the safe and efficient operation of autonomous vehicles. In this paper, a diffusion-based generative model for multi-agent trajectory prediction is proposed. The model is capable of capturing the complex interactions between traffic participants and the environment, accurately learning the multimodal nature of the data. The effectiveness of the approach is assessed on large-scale datasets of real-world traffic scenarios, showing that our model outperforms several well-established methods in terms of prediction accuracy. By the incorporation of differential motion constraints on the model output, we illustrate that our model is capable of generating a diverse set of realistic future trajectories. Through the use of an interaction-aware guidance signal, we further demonstrate that the model can be adapted to predict the behavior of less cooperative agents, emphasizing its practical applicability under uncertain traffic conditions.

Diffusion-Based Environment-Aware Trajectory Prediction

TL;DR

This work tackles the challenge of predicting future trajectories for multiple road users under multimodal uncertainty by introducing a diffusion-based conditional model that jointly represents inter-agent interactions and environment knowledge via Graph Attention Networks and Graph-GRU. It integrates lane-graph map information and enforces physical feasibility through differential motion constraints, while enabling interaction-aware guidance to modulate samples. The approach achieves state-of-the-art accuracy on real-world highD and roundabout datasets and demonstrates the model's capacity for multimodal predictions with few diffusion steps. The results suggest practical applicability for robust motion planning in autonomous driving, including scenarios with less cooperative agents and varying interaction strength.

Abstract

The ability to predict the future trajectories of traffic participants is crucial for the safe and efficient operation of autonomous vehicles. In this paper, a diffusion-based generative model for multi-agent trajectory prediction is proposed. The model is capable of capturing the complex interactions between traffic participants and the environment, accurately learning the multimodal nature of the data. The effectiveness of the approach is assessed on large-scale datasets of real-world traffic scenarios, showing that our model outperforms several well-established methods in terms of prediction accuracy. By the incorporation of differential motion constraints on the model output, we illustrate that our model is capable of generating a diverse set of realistic future trajectories. Through the use of an interaction-aware guidance signal, we further demonstrate that the model can be adapted to predict the behavior of less cooperative agents, emphasizing its practical applicability under uncertain traffic conditions.
Paper Structure (31 sections, 26 equations, 6 figures, 7 tables)

This paper contains 31 sections, 26 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: The directed graphical model considered in this work. The goal of diffusion models is to learn the process of transforming noise into samples that are representative of the true data distribution. In this work, the task of the proposed model is to generate realistic and physically feasible future trajectories for road users.
  • Figure 2: Schematic illustration of the proposed model. The model takes as input the condition $\bm{c} = \{\mathcal{H}, \mathcal{J}\}$, the current latent state $\bm{x}_{t}$, and diffusion step $t$ to predict agent trajectories $\bm{x}_{0}$. The state history $\mathcal{H}$ up until the prediction time instant and the lane graph $\mathcal{J}$ are encoded using two modules. The diffusion step $t$ is passed through a Fourier feature encoder song2020score and then summed with the embedded latent state. Next, the sum is passed through two mechanisms vaswani2017attention, one for each encoded condition part, and the resulting representations are concatenated and fused using an MLP to create a combined context encoding. The context encoding is then input into a Graph-GRU decoder (together with the last hidden state of the encoder) to compute the motion control inputs $\bm{u}_{}$. Using the computed control action and the agent states $\bm{x}_{0}^{\text{init}}$ at the prediction time instant, the predicted trajectories are solved for using numerical integration.
  • Figure 3: Three example predictions on the highDhighDdataset test set. The vehicles in the plots are used to represent the agents in the scene at the prediction time instant. The predicted trajectories are shown with the same colored scatter plots as the agent under investigation, while the ground truth is shown with a solid black line. For visual clarity, we only show the predicted trajectories for vehicles performing a lane change (the model predicts lane-keeping maneuvers with equal accuracy).
  • Figure 4: Three example predictions on the rounDrounDdataset test set. The predicted trajectories are shown with the same colored scatter plots as the agent under investigation, while the ground truth is shown with a solid black line. What is interesting to note is the accurate prediction of the pedestrian in the rightmost plot. Since the state-transition dynamics are parametrized using neural ODEs, it illustrates that the model has learned a motion model that is representative of the real-world data.
  • Figure 5: Three example predictions on the rounDrounDdataset test set. The multiple predicted trajectories for a specific agent are shown with same-colored scatter plots. The scenarios were chosen to illustrate the model's ability to predict a diverse set of future trajectories. Each prediction is based on a unique sample from the diffusion process, thereby showcasing that the model has implicitly learned to capture the multimodal nature of the data.
  • ...and 1 more figures