Table of Contents
Fetching ...

Diffusion^2: Dual Diffusion Model with Uncertainty-Aware Adaptive Noise for Momentary Trajectory Prediction

Yuhao Luo, Yuang Zhang, Kehua Chen, Xinyu Zheng, Shucheng Zhang, Sikai Chen, Yinhai Wang

TL;DR

This work tackles momentary pedestrian trajectory prediction when only two frames are observed. It introduces Diffusion^2, a two-stage diffusion framework that first reconstructs unobserved history and its aleatoric uncertainty, then predicts future trajectories using a context-enhanced forward diffusion model guided by a temporally adaptive noise scheduler. A dual-head parameterization provides per-coordinate uncertainty estimates, and a learned noise-scheduling mechanism adjusts noise injection based on predicted uncertainty, achieving state-of-the-art results on ETH/UCY and SDD. The approach offers accurate, multimodal trajectory generation with explicit uncertainty handling, though it incurs higher inference cost and computational demands than some discriminative baselines, suggesting directions for efficiency improvements and broader deployment in dynamic traffic settings.

Abstract

Accurate pedestrian trajectory prediction is crucial for ensuring safety and efficiency in autonomous driving and human-robot interaction scenarios. Earlier studies primarily utilized sufficient observational data to predict future trajectories. However, in real-world scenarios, such as pedestrians suddenly emerging from blind spots, sufficient observational data is often unavailable (i.e. momentary trajectory), making accurate prediction challenging and increasing the risk of traffic accidents. Therefore, advancing research on pedestrian trajectory prediction under extreme scenarios is critical for enhancing traffic safety. In this work, we propose a novel framework termed Diffusion^2, tailored for momentary trajectory prediction. Diffusion^2 consists of two sequentially connected diffusion models: one for backward prediction, which generates unobserved historical trajectories, and the other for forward prediction, which forecasts future trajectories. Given that the generated unobserved historical trajectories may introduce additional noise, we propose a dual-head parameterization mechanism to estimate their aleatoric uncertainty and design a temporally adaptive noise module that dynamically modulates the noise scale in the forward diffusion process. Empirically, Diffusion^2 sets a new state-of-the-art in momentary trajectory prediction on ETH/UCY and Stanford Drone datasets.

Diffusion^2: Dual Diffusion Model with Uncertainty-Aware Adaptive Noise for Momentary Trajectory Prediction

TL;DR

This work tackles momentary pedestrian trajectory prediction when only two frames are observed. It introduces Diffusion^2, a two-stage diffusion framework that first reconstructs unobserved history and its aleatoric uncertainty, then predicts future trajectories using a context-enhanced forward diffusion model guided by a temporally adaptive noise scheduler. A dual-head parameterization provides per-coordinate uncertainty estimates, and a learned noise-scheduling mechanism adjusts noise injection based on predicted uncertainty, achieving state-of-the-art results on ETH/UCY and SDD. The approach offers accurate, multimodal trajectory generation with explicit uncertainty handling, though it incurs higher inference cost and computational demands than some discriminative baselines, suggesting directions for efficiency improvements and broader deployment in dynamic traffic settings.

Abstract

Accurate pedestrian trajectory prediction is crucial for ensuring safety and efficiency in autonomous driving and human-robot interaction scenarios. Earlier studies primarily utilized sufficient observational data to predict future trajectories. However, in real-world scenarios, such as pedestrians suddenly emerging from blind spots, sufficient observational data is often unavailable (i.e. momentary trajectory), making accurate prediction challenging and increasing the risk of traffic accidents. Therefore, advancing research on pedestrian trajectory prediction under extreme scenarios is critical for enhancing traffic safety. In this work, we propose a novel framework termed Diffusion^2, tailored for momentary trajectory prediction. Diffusion^2 consists of two sequentially connected diffusion models: one for backward prediction, which generates unobserved historical trajectories, and the other for forward prediction, which forecasts future trajectories. Given that the generated unobserved historical trajectories may introduce additional noise, we propose a dual-head parameterization mechanism to estimate their aleatoric uncertainty and design a temporally adaptive noise module that dynamically modulates the noise scale in the forward diffusion process. Empirically, Diffusion^2 sets a new state-of-the-art in momentary trajectory prediction on ETH/UCY and Stanford Drone datasets.

Paper Structure

This paper contains 24 sections, 22 equations, 7 figures, 3 tables, 2 algorithms.

Figures (7)

  • Figure 1: Three distinct frameworks are proposed to address momentary trajectory prediction. (a) The model directly utilizes observable trajectories to predict future trajectories. (b) The approach jointly predicts unobservable historical trajectories and future trajectories. (c) Our proposed framework, Diffusion$^\mathbf{2}$, consists of two sequentially connected diffusion models: one dedicated to backward prediction, and the other to forward prediction.
  • Figure 2: The overview of our proposed Diffusion$^\mathbf{2}$: (a) Framework: Diffusion$^\mathbf{2}$ consists of two sequentially connected diffusion models, $\textit{DDPM}_{past}$ and $\textit{DDPM}_{fut}$. $\textit{DDPM}_{past}$ simultaneously reconstructs the unobserved historical trajectory $\mathbf{x}^{ubs}_0$ and estimates its associated uncertainty $\mathbf{u}$. Then, $\textit{DDPM}_{fut}$ predicts the future trajectories $\mathbf{y}_0$. Both diffusion models share a common encoder that extracts contextual features $\mathbf{h_1}$ from the observed history $\mathbf{x}^{obs}$, while $\textit{DDPM}_{fut}$ employs an additional trajectory encoder to extract features $\mathbf{v}_1$ from the predicted unobserved historical trajectory $\mathbf{x}^{ubs}_0$. (b) Dual-head Parameterization Mechanism: The standard noise prediction network is augmented with two output heads, one for predicting noise and the other for predicting variance, which corresponds to aleatoric uncertainty. (c) Learnable Temporally Adaptive Noise Scheduling: A gamma module takes $\mathbf{u}$ and the diffusion step $m$ as inputs and dynamically modulates the noise scale in the forward diffusion process of $\textit{DDPM}_{fut}$. Note that the red dashed line indicates the unobserved historical trajectory, with gray shading representing its uncertainty, while the green and blue dashed lines denote the observed history and future trajectories, respectively.
  • Figure 3: A visualization of the learned $\lambda$ and $l$ over diffusion step $m$, where each $l_m = \operatorname{sigmoid}(\gamma_{\varphi}(\mathbf{u}, m))$.
  • Figure 4: Visualization of generated trajectories across reverse-diffusion time $m$. The red and green lines represent ground-truth future and observable trajectories. As $m$ increases, the process progressively suppresses indeterminacy and converges toward the desired future path. Initialized from a pure gaussian noise at $m=0$ and conditioned on context information, the model gradually removes implausible hypotheses and aligns with the ground-truth future (red, solid).
  • Figure 5: Visualization of predicted unobservable historical and future trajeactories on the ETH/UCY dataset. The red lines represent the unobserved historical trajectories, while the cyan lines denote the observed historical trajectories. The ground truth future trajectories are marked in light blue. Predictions from Diffusion$^\mathbf{2}$ are visualized as blue dashed lines for future and orange dashed lines for history, whereas the PCCSNet predictions are highlighted in magenta dashed lines.
  • ...and 2 more figures