Table of Contents
Fetching ...

TrajDiffuse: A Conditional Diffusion Model for Environment-Aware Trajectory Prediction

Qingze, Liu, Danrui Li, Samuel S. Sohn, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic

TL;DR

TrajDiffuse tackles the challenge of environment-aware trajectory prediction by formulating it as a conditional denoising diffusion process that interpolates an agent's observed history with predicted motion intent. A novel map-based gradient guidance term enforces environmental feasibility during sample generation, producing accurate, diverse, and environment-compliant trajectories. The approach achieves state-of-the-art performance on PFSD and nuScenes across accuracy, diversity, and environmental feasibility metrics, notably excelling in ECFL while maintaining competitive ADE/FDE and KDE NLL. The framework opens avenues for incorporating richer scene dynamics and end-to-end intent conditioning, with potential extensions to multi-agent interactions and more complex HD-map cues.

Abstract

Accurate prediction of human or vehicle trajectories with good diversity that captures their stochastic nature is an essential task for many applications. However, many trajectory prediction models produce unreasonable trajectory samples that focus on improving diversity or accuracy while neglecting other key requirements, such as collision avoidance with the surrounding environment. In this work, we propose TrajDiffuse, a planning-based trajectory prediction method using a novel guided conditional diffusion model. We form the trajectory prediction problem as a denoising impaint task and design a map-based guidance term for the diffusion process. TrajDiffuse is able to generate trajectory predictions that match or exceed the accuracy and diversity of the SOTA, while adhering almost perfectly to environmental constraints. We demonstrate the utility of our model through experiments on the nuScenes and PFSD datasets and provide an extensive benchmark analysis against the SOTA methods.

TrajDiffuse: A Conditional Diffusion Model for Environment-Aware Trajectory Prediction

TL;DR

TrajDiffuse tackles the challenge of environment-aware trajectory prediction by formulating it as a conditional denoising diffusion process that interpolates an agent's observed history with predicted motion intent. A novel map-based gradient guidance term enforces environmental feasibility during sample generation, producing accurate, diverse, and environment-compliant trajectories. The approach achieves state-of-the-art performance on PFSD and nuScenes across accuracy, diversity, and environmental feasibility metrics, notably excelling in ECFL while maintaining competitive ADE/FDE and KDE NLL. The framework opens avenues for incorporating richer scene dynamics and end-to-end intent conditioning, with potential extensions to multi-agent interactions and more complex HD-map cues.

Abstract

Accurate prediction of human or vehicle trajectories with good diversity that captures their stochastic nature is an essential task for many applications. However, many trajectory prediction models produce unreasonable trajectory samples that focus on improving diversity or accuracy while neglecting other key requirements, such as collision avoidance with the surrounding environment. In this work, we propose TrajDiffuse, a planning-based trajectory prediction method using a novel guided conditional diffusion model. We form the trajectory prediction problem as a denoising impaint task and design a map-based guidance term for the diffusion process. TrajDiffuse is able to generate trajectory predictions that match or exceed the accuracy and diversity of the SOTA, while adhering almost perfectly to environmental constraints. We demonstrate the utility of our model through experiments on the nuScenes and PFSD datasets and provide an extensive benchmark analysis against the SOTA methods.

Paper Structure

This paper contains 38 sections, 20 equations, 6 figures, 7 tables, 2 algorithms.

Figures (6)

  • Figure 1: Top Left: Illustration of the denoising trajectory prediction process. Green dots indicate the observed trajectory and the predicted way points. Red dots are the denoised prediction. Top Right: A map from nuScene dataset overlayed with the distance transform showing the distance to the navigable areas (roads) in the scene. The color scale represents the normalized distance from closest (blue) to the farthest (red). Bottom: Comparison against other SOTA methods on PFSD dataset
  • Figure 1: Visualizations for PFSD with complex layouts and hard maneuvers with $K= 20$. Each column contains visualizations of an agent's trajectories predicted by the model indicated at the top of the column. Each row corresponds to the agent with identical initial conditions and identical prior motion history. The blue dashed line indicates the observed and GT trajectory; the red dashed line indicates the predicted trajectory.
  • Figure 2: Details on TrajDiffuse Model Structure. Left: Prediction pipeline of the TrajDiffuse model. Here we represent the data as a two-channel one-dimensional signal; the two channels in the input and output correspond to the two dimensions of the position coordinate. Right: Illustration of the conditional denoising process inside a diffusion block. The input and output are conditioned on the observed trajectory and the predicted intents. The U-net-encoded bottleneck features are attended across the channels and decoded. The output is then conditioned by the observed trajectory history and the predicted waypoints.
  • Figure 2: Visualizations for hard nuScenes instances with slow observed trajectories with $K= 10$. Each column contains visualizations of an agent's trajectories predicted by the model indicated at the top of the column. Each row corresponds to the agent with identical initial conditions and identical prior motion history. The blue dashed line indicates the observed and GT trajectory; the red dashed line indicates the predicted trajectory.
  • Figure 3: Visualizations for qualitative analysis on PFSD and nuScenes datasets. Each column contains visualizations of an agent's trajectories predicted by the model indicated at the top of the column. Each row corresponds to the agent with identical initial conditions and identical prior motion history. Blue dashed lines denote the ground-truth trajectories. Red dashed lines are predicted trajectories.
  • ...and 1 more figures