Joint Pedestrian Trajectory Prediction through Posterior Sampling

Haotian Lin; Yixiao Wang; Mingxiao Huo; Chensheng Peng; Zhiyuan Liu; Masayoshi Tomizuka

Joint Pedestrian Trajectory Prediction through Posterior Sampling

Haotian Lin, Yixiao Wang, Mingxiao Huo, Chensheng Peng, Zhiyuan Liu, Masayoshi Tomizuka

TL;DR

This work tackles the challenge of joint pedestrian trajectory prediction under noisy and incomplete history by introducing the Guided Full Trajectory Diffuser (GFTD), a diffusion-model framework that learns the joint distribution of full trajectories (historical and future) and enables posterior sampling for robust, controllable generation without extra training. By recasting prediction as a trajectory inpainting problem, GFTD leverages a pre-trained diffusion model and inference-time posterior guidance to softly enforce history constraints, while optionally supporting stronger conditioning via RePaint. The approach combines a latent trajectory representation, a graph-based denoiser, and a flexible guidance mechanism to achieve competitive joint-prediction performance and enhanced controllable generation, particularly in scenarios with noise or missing history. Overall, GFTD offers a robust, adaptable, and model-agnostic solution for real-world multi-agent navigation where data imperfections are common and task-specific guidance can be incorporated at inference time.

Abstract

Joint pedestrian trajectory prediction has long grappled with the inherent unpredictability of human behaviors. Recent investigations employing variants of conditional diffusion models in trajectory prediction have exhibited notable success. Nevertheless, the heavy dependence on accurate historical data results in their vulnerability to noise disturbances and data incompleteness. To improve the robustness and reliability, we introduce the Guided Full Trajectory Diffuser (GFTD), a novel diffusion model framework that captures the joint full (historical and future) trajectory distribution. By learning from the full trajectory, GFTD can recover the noisy and missing data, hence improving the robustness. In addition, GFTD can adapt to data imperfections without additional training requirements, leveraging posterior sampling for reliable prediction and controllable generation. Our approach not only simplifies the prediction process but also enhances generalizability in scenarios with noise and incomplete inputs. Through rigorous experimental evaluation, GFTD exhibits superior performance in both trajectory prediction and controllable generation.

Joint Pedestrian Trajectory Prediction through Posterior Sampling

TL;DR

Abstract

Paper Structure (18 sections, 17 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 17 equations, 4 figures, 4 tables, 1 algorithm.

INTRODUCTION
Related Work
Pedestrian Trajectory Prediction
Joint Trajectory Distribution Modelling
Posterior Sampling for Inverse Problems
Methods
Preliminaries on Diffusion models
Problem Formulation
Represent Full Trajectory Distribution with Diffusion
Robust Prediction as Posterior Sampling
Trajectory Latent Representation
Framework Architecture
Experiments
Experimental Setups
Trajectory Prediction and Controllable Generation
...and 3 more sections

Figures (4)

Figure 1: Illustration of existing trajectory prediction framework and our guided full trajectory diffuser framework. (a) Multi-agent trajectory prediction methods directly generate entire future trajectories through supervised learning. (b) Diffusion-based Multi-agent trajectory prediction methods generate future trajectories step-by-step during the denoising process. (c) Our method for multi-agent trajectory prediction incorporates history guidance within the diffusion framework. It predicts entire trajectories and supports additional capabilities in a single model, including Noise-Robust Trajectory Prediction, Incomplete Data Prediction, and Controllable Trajectory Generation.
Figure 2: Architecture of our proposed GFTD framework. During inference, GFTD samples from Gaussian noise, and iteratively recover data with the denoise module $\epsilon_{\theta}$. It takes in noisy latent node features $x_t^i$ and edge features $e$. After MLP encoding, the edge is augmented by concatenating the encoded node features and they are both sent into the Processor which consists of stacked Graph Attention (GAT) layers. We then map the nodes to the same dimensions as $x_t$ and add residual connection, resulting in the predicted intermediate noise $\epsilon_t$. After each denoise step, we address conditions through posterior sampling. Finally, the denoised latent nodes $x_0$ are converted to trajectory space.
Figure 3: Qualitative visualization of controllable generation. Red lines represent the ground truth trajectories. With the given goal point and history, our model can generate goal-oriented trajectories with considerable realism.
Figure 4: JADE/JFDE performance comparison between GFDT and GFDT+RePaint with noisy data input.

Joint Pedestrian Trajectory Prediction through Posterior Sampling

TL;DR

Abstract

Joint Pedestrian Trajectory Prediction through Posterior Sampling

Authors

TL;DR

Abstract

Table of Contents

Figures (4)