Table of Contents
Fetching ...

Joint Pedestrian Trajectory Prediction through Posterior Sampling

Haotian Lin, Yixiao Wang, Mingxiao Huo, Chensheng Peng, Zhiyuan Liu, Masayoshi Tomizuka

TL;DR

This work tackles the challenge of joint pedestrian trajectory prediction under noisy and incomplete history by introducing the Guided Full Trajectory Diffuser (GFTD), a diffusion-model framework that learns the joint distribution of full trajectories (historical and future) and enables posterior sampling for robust, controllable generation without extra training. By recasting prediction as a trajectory inpainting problem, GFTD leverages a pre-trained diffusion model and inference-time posterior guidance to softly enforce history constraints, while optionally supporting stronger conditioning via RePaint. The approach combines a latent trajectory representation, a graph-based denoiser, and a flexible guidance mechanism to achieve competitive joint-prediction performance and enhanced controllable generation, particularly in scenarios with noise or missing history. Overall, GFTD offers a robust, adaptable, and model-agnostic solution for real-world multi-agent navigation where data imperfections are common and task-specific guidance can be incorporated at inference time.

Abstract

Joint pedestrian trajectory prediction has long grappled with the inherent unpredictability of human behaviors. Recent investigations employing variants of conditional diffusion models in trajectory prediction have exhibited notable success. Nevertheless, the heavy dependence on accurate historical data results in their vulnerability to noise disturbances and data incompleteness. To improve the robustness and reliability, we introduce the Guided Full Trajectory Diffuser (GFTD), a novel diffusion model framework that captures the joint full (historical and future) trajectory distribution. By learning from the full trajectory, GFTD can recover the noisy and missing data, hence improving the robustness. In addition, GFTD can adapt to data imperfections without additional training requirements, leveraging posterior sampling for reliable prediction and controllable generation. Our approach not only simplifies the prediction process but also enhances generalizability in scenarios with noise and incomplete inputs. Through rigorous experimental evaluation, GFTD exhibits superior performance in both trajectory prediction and controllable generation.

Joint Pedestrian Trajectory Prediction through Posterior Sampling

TL;DR

This work tackles the challenge of joint pedestrian trajectory prediction under noisy and incomplete history by introducing the Guided Full Trajectory Diffuser (GFTD), a diffusion-model framework that learns the joint distribution of full trajectories (historical and future) and enables posterior sampling for robust, controllable generation without extra training. By recasting prediction as a trajectory inpainting problem, GFTD leverages a pre-trained diffusion model and inference-time posterior guidance to softly enforce history constraints, while optionally supporting stronger conditioning via RePaint. The approach combines a latent trajectory representation, a graph-based denoiser, and a flexible guidance mechanism to achieve competitive joint-prediction performance and enhanced controllable generation, particularly in scenarios with noise or missing history. Overall, GFTD offers a robust, adaptable, and model-agnostic solution for real-world multi-agent navigation where data imperfections are common and task-specific guidance can be incorporated at inference time.

Abstract

Joint pedestrian trajectory prediction has long grappled with the inherent unpredictability of human behaviors. Recent investigations employing variants of conditional diffusion models in trajectory prediction have exhibited notable success. Nevertheless, the heavy dependence on accurate historical data results in their vulnerability to noise disturbances and data incompleteness. To improve the robustness and reliability, we introduce the Guided Full Trajectory Diffuser (GFTD), a novel diffusion model framework that captures the joint full (historical and future) trajectory distribution. By learning from the full trajectory, GFTD can recover the noisy and missing data, hence improving the robustness. In addition, GFTD can adapt to data imperfections without additional training requirements, leveraging posterior sampling for reliable prediction and controllable generation. Our approach not only simplifies the prediction process but also enhances generalizability in scenarios with noise and incomplete inputs. Through rigorous experimental evaluation, GFTD exhibits superior performance in both trajectory prediction and controllable generation.
Paper Structure (18 sections, 17 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 17 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of existing trajectory prediction framework and our guided full trajectory diffuser framework. (a) Multi-agent trajectory prediction methods directly generate entire future trajectories through supervised learning. (b) Diffusion-based Multi-agent trajectory prediction methods generate future trajectories step-by-step during the denoising process. (c) Our method for multi-agent trajectory prediction incorporates history guidance within the diffusion framework. It predicts entire trajectories and supports additional capabilities in a single model, including Noise-Robust Trajectory Prediction, Incomplete Data Prediction, and Controllable Trajectory Generation.
  • Figure 2: Architecture of our proposed GFTD framework. During inference, GFTD samples from Gaussian noise, and iteratively recover data with the denoise module $\epsilon_{\theta}$. It takes in noisy latent node features $x_t^i$ and edge features $e$. After MLP encoding, the edge is augmented by concatenating the encoded node features and they are both sent into the Processor which consists of stacked Graph Attention (GAT) layers. We then map the nodes to the same dimensions as $x_t$ and add residual connection, resulting in the predicted intermediate noise $\epsilon_t$. After each denoise step, we address conditions through posterior sampling. Finally, the denoised latent nodes $x_0$ are converted to trajectory space.
  • Figure 3: Qualitative visualization of controllable generation. Red lines represent the ground truth trajectories. With the given goal point and history, our model can generate goal-oriented trajectories with considerable realism.
  • Figure 4: JADE/JFDE performance comparison between GFDT and GFDT+RePaint with noisy data input.