Table of Contents
Fetching ...

Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction

Yuxin Yang, Pengfei Zhu, Mengshi Qi, Huadong Ma

TL;DR

This paper tackles human trajectory forecasting under inherent uncertainty by introducing Motion Pattern Priors Memory Network (MP^2MNet), a memory-augmented diffusion framework. It builds a memory bank of clustered motion patterns from training data and retrieves a matched pattern along with a target distribution to form a target priors memory token that conditions a Transformer-based decoder within a reverse diffusion process. The model combines an encoder (from Trajectron++), memory-based pattern guidance, and a target-guided diffusion objective, enabling diverse and plausible future trajectories. Empirical results on ETH/UCY and Stanford Drone Dataset show state-of-the-art performance, with ablations confirming the memory priors’ contribution by reducing ADE by 11.5% and FDE by 12% on average.

Abstract

Human trajectory forecasting is a critical challenge in fields such as robotics and autonomous driving. Due to the inherent uncertainty of human actions and intentions in real-world scenarios, various unexpected occurrences may arise. To uncover latent motion patterns in human behavior, we introduce a novel memory-based method, named Motion Pattern Priors Memory Network. Our method involves constructing a memory bank derived from clustered prior knowledge of motion patterns observed in the training set trajectories. We introduce an addressing mechanism to retrieve the matched pattern and the potential target distributions for each prediction from the memory bank, which enables the identification and retrieval of natural motion patterns exhibited by agents, subsequently using the target priors memory token to guide the diffusion model to generate predictions. Extensive experiments validate the effectiveness of our approach, achieving state-of-the-art trajectory prediction accuracy. The code will be made publicly available.

Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction

TL;DR

This paper tackles human trajectory forecasting under inherent uncertainty by introducing Motion Pattern Priors Memory Network (MP^2MNet), a memory-augmented diffusion framework. It builds a memory bank of clustered motion patterns from training data and retrieves a matched pattern along with a target distribution to form a target priors memory token that conditions a Transformer-based decoder within a reverse diffusion process. The model combines an encoder (from Trajectron++), memory-based pattern guidance, and a target-guided diffusion objective, enabling diverse and plausible future trajectories. Empirical results on ETH/UCY and Stanford Drone Dataset show state-of-the-art performance, with ablations confirming the memory priors’ contribution by reducing ADE by 11.5% and FDE by 12% on average.

Abstract

Human trajectory forecasting is a critical challenge in fields such as robotics and autonomous driving. Due to the inherent uncertainty of human actions and intentions in real-world scenarios, various unexpected occurrences may arise. To uncover latent motion patterns in human behavior, we introduce a novel memory-based method, named Motion Pattern Priors Memory Network. Our method involves constructing a memory bank derived from clustered prior knowledge of motion patterns observed in the training set trajectories. We introduce an addressing mechanism to retrieve the matched pattern and the potential target distributions for each prediction from the memory bank, which enables the identification and retrieval of natural motion patterns exhibited by agents, subsequently using the target priors memory token to guide the diffusion model to generate predictions. Extensive experiments validate the effectiveness of our approach, achieving state-of-the-art trajectory prediction accuracy. The code will be made publicly available.
Paper Structure (11 sections, 13 equations, 2 figures, 2 tables)

This paper contains 11 sections, 13 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The overview of our proposed MP$^2$MNet method. It contains an encoder, the motion pattern priors memory bank, and a Transformer-based decoder. The encoder captures information to obtain the motion state representation. $S$ denotes the total diffusion step and $s$ denotes the $s^{th}$ step. $Y^s$ is corrupted $s$ steps by adding noise variable to ground-truth $Y^0$. The decoder processes $Y^s$ along with motion state embedding, target priors memory token, and time embedding to generate the output. The training objective is to minimize the mean square error (MSE) loss between the output and the noise variable in the Gaussian distribution. This is achieved through target-guided diffusion generation for each iteration $s$ to optimize the network.
  • Figure 2: Visualization comparison on the ETH/UCY datasets. We compare the best-of-20 predictions generated by our approach with those from two baseline methods: the previous MID method gu2022stochastic and our method without the motion pattern priors memory. Ground truths are in red solid lines, past trajectories in dark blue solid lines, and prediction results in light blue lines. We visualize 20 predictions for each agent with light blue dashed lines and corresponding targets are marked with stars. The result shows significant improvements by utilizing our memory-based method.