Table of Contents
Fetching ...

C2F-TP: A Coarse-to-Fine Denoising Framework for Uncertainty-Aware Trajectory Prediction

Zichen Wang, Hao Miao, Senzhang Wang, Renzhi Wang, Jianxin Wang, Jian Zhang

TL;DR

C2F-TP addresses uncertainty in vehicle trajectory prediction by a coarse-to-fine framework that first learns a multimodal future trajectory distribution through a spatial-temporal interaction module and then refines sampled trajectories with a conditional diffusion-based denoising model. The spatial-temporal module combines motion encoding, wave-inspired interaction pooling, and a re-weighted multimodal predictor to generate multiple plausible future maneuvers and corresponding trajectory distributions. A diffusion-based refinement stage denoises $k$ sampled trajectories conditioned on historical context, yielding more accurate and stable predictions, with strong empirical results on NGSIM and highD. This approach advances uncertainty-aware trajectory prediction by explicitly modeling multimodality and temporal interactions, offering practical gains for safer autonomous driving and planning systems.

Abstract

Accurately predicting the trajectory of vehicles is critically important for ensuring safety and reliability in autonomous driving. Although considerable research efforts have been made recently, the inherent trajectory uncertainty caused by various factors including the dynamic driving intends and the diverse driving scenarios still poses significant challenges to accurate trajectory prediction. To address this issue, we propose C2F-TP, a coarse-to-fine denoising framework for uncertainty-aware vehicle trajectory prediction. C2F-TP features an innovative two-stage coarse-to-fine prediction process. Specifically, in the spatial-temporal interaction stage, we propose a spatial-temporal interaction module to capture the inter-vehicle interactions and learn a multimodal trajectory distribution, from which a certain number of noisy trajectories are sampled. Next, in the trajectory refinement stage, we design a conditional denoising model to reduce the uncertainty of the sampled trajectories through a step-wise denoising operation. Extensive experiments are conducted on two real datasets NGSIM and highD that are widely adopted in trajectory prediction. The result demonstrates the effectiveness of our proposal.

C2F-TP: A Coarse-to-Fine Denoising Framework for Uncertainty-Aware Trajectory Prediction

TL;DR

C2F-TP addresses uncertainty in vehicle trajectory prediction by a coarse-to-fine framework that first learns a multimodal future trajectory distribution through a spatial-temporal interaction module and then refines sampled trajectories with a conditional diffusion-based denoising model. The spatial-temporal module combines motion encoding, wave-inspired interaction pooling, and a re-weighted multimodal predictor to generate multiple plausible future maneuvers and corresponding trajectory distributions. A diffusion-based refinement stage denoises sampled trajectories conditioned on historical context, yielding more accurate and stable predictions, with strong empirical results on NGSIM and highD. This approach advances uncertainty-aware trajectory prediction by explicitly modeling multimodality and temporal interactions, offering practical gains for safer autonomous driving and planning systems.

Abstract

Accurately predicting the trajectory of vehicles is critically important for ensuring safety and reliability in autonomous driving. Although considerable research efforts have been made recently, the inherent trajectory uncertainty caused by various factors including the dynamic driving intends and the diverse driving scenarios still poses significant challenges to accurate trajectory prediction. To address this issue, we propose C2F-TP, a coarse-to-fine denoising framework for uncertainty-aware vehicle trajectory prediction. C2F-TP features an innovative two-stage coarse-to-fine prediction process. Specifically, in the spatial-temporal interaction stage, we propose a spatial-temporal interaction module to capture the inter-vehicle interactions and learn a multimodal trajectory distribution, from which a certain number of noisy trajectories are sampled. Next, in the trajectory refinement stage, we design a conditional denoising model to reduce the uncertainty of the sampled trajectories through a step-wise denoising operation. Extensive experiments are conducted on two real datasets NGSIM and highD that are widely adopted in trajectory prediction. The result demonstrates the effectiveness of our proposal.

Paper Structure

This paper contains 27 sections, 20 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of the dynamics and temporal correlation of inter-vehicle interactions.
  • Figure 2: Interaction between two waves with different phases. Below is a superposition of two waves in the complex-valued domain, and the figure above shows how their projections along the real axis vary with phase.
  • Figure 3: Framework of the proposed C2F-TP model, which contains a spatial-temporal interaction module and a refinement module. The spatial-temporal interaction module captures the dynamics and temporal correlations of inter-vehicle interactions and generates a multimodal trajectory distribution. A certain number of noisy trajectories are sampled from the multimodal trajectory distribution. The refinement module then denoises these noisy trajectories and finally generates accurate predicted trajectories.
  • Figure 4: Comparison between C2F-TP and its variants.
  • Figure 5: Visualisation of S-LSTM, CS-LSTM and C2F-TP for three driving scenarios.
  • ...and 2 more figures