Table of Contents
Fetching ...

AGMA: Adaptive Gaussian Mixture Anchors for Prior-Guided Multimodal Human Trajectory Forecasting

Chao Li, Rui Zhang, Siyuan Huang, Xian Zhong, Hongbo Jiang

TL;DR

AGMA addresses the core bottleneck in multimodal human trajectory forecasting: misaligned priors. By first extracting batch-specific priors through graph-based clustering and then distilling them into a scene-adaptive global GMM via optimal transport and cross-attention, AGMA explicitly optimizes prior quality rather than relying on fixed or implicitly learned priors. Theoretical analysis links prior-sampler interactions to distribution matching accuracy and demonstrates that high-quality priors are necessary for faithful multimodal predictions. Empirically, AGMA achieves state-of-the-art results on ETH-UCY, SDD, and JRDB, validating the practical impact of explicit prior optimization for autonomous navigation and related AI systems.

Abstract

Human trajectory forecasting requires capturing the multimodal nature of pedestrian behavior. However, existing approaches suffer from prior misalignment. Their learned or fixed priors often fail to capture the full distribution of plausible futures, limiting both prediction accuracy and diversity. We theoretically establish that prediction error is lower-bounded by prior quality, making prior modeling a key performance bottleneck. Guided by this insight, we propose AGMA (Adaptive Gaussian Mixture Anchors), which constructs expressive priors through two stages: extracting diverse behavioral patterns from training data and distilling them into a scene-adaptive global prior for inference. Extensive experiments on ETH-UCY, Stanford Drone, and JRDB datasets demonstrate that AGMA achieves state-of-the-art performance, confirming the critical role of high-quality priors in trajectory forecasting.

AGMA: Adaptive Gaussian Mixture Anchors for Prior-Guided Multimodal Human Trajectory Forecasting

TL;DR

AGMA addresses the core bottleneck in multimodal human trajectory forecasting: misaligned priors. By first extracting batch-specific priors through graph-based clustering and then distilling them into a scene-adaptive global GMM via optimal transport and cross-attention, AGMA explicitly optimizes prior quality rather than relying on fixed or implicitly learned priors. Theoretical analysis links prior-sampler interactions to distribution matching accuracy and demonstrates that high-quality priors are necessary for faithful multimodal predictions. Empirically, AGMA achieves state-of-the-art results on ETH-UCY, SDD, and JRDB, validating the practical impact of explicit prior optimization for autonomous navigation and related AI systems.

Abstract

Human trajectory forecasting requires capturing the multimodal nature of pedestrian behavior. However, existing approaches suffer from prior misalignment. Their learned or fixed priors often fail to capture the full distribution of plausible futures, limiting both prediction accuracy and diversity. We theoretically establish that prediction error is lower-bounded by prior quality, making prior modeling a key performance bottleneck. Guided by this insight, we propose AGMA (Adaptive Gaussian Mixture Anchors), which constructs expressive priors through two stages: extracting diverse behavioral patterns from training data and distilling them into a scene-adaptive global prior for inference. Extensive experiments on ETH-UCY, Stanford Drone, and JRDB datasets demonstrate that AGMA achieves state-of-the-art performance, confirming the critical role of high-quality priors in trajectory forecasting.
Paper Structure (36 sections, 3 theorems, 45 equations, 3 figures, 4 tables)

This paper contains 36 sections, 3 theorems, 45 equations, 3 figures, 4 tables.

Key Result

Theorem 3.1

For a given scene with observed trajectories $X$ and target agent $j$, define the prior error and sampler error as: where $\epsilon_{\mathrm{prior}}(X)$ measures the distributional mismatch between the true prior $p(z|X)$ and the learned prior $q(z|X)$, and $\epsilon_{\mathrm{sample}}(X)$ quantifies the sampler's reconstruction accuracy. Then the prediction loss satisfies:

Figures (3)

  • Figure 1: Qualitative comparison of priors at a three-way intersection with two agents: (a) Implicit Gaussian priors: Simple Gaussian priors collapse to a single dominant mode during training, yielding limited diversity. (b) Discrete anchor priors: Fixed discrete priors produce repetitive predictions that fail to adapt to scene-specific contexts. (c) AGMA (Ours): Adaptive Gaussian Mixture Anchors generate diverse, semantically aligned predictions through explicit, scene-aware prior construction.
  • Figure 2: AGMA architecture. Left: Graph-based clustering discovers behavioral patterns within each batch, forming batch-level GMM priors. Right: Optimal transport distills batch priors into a global GMM, refined via trajectory prediction with a shared decoder.
  • Figure 3: Sensitivity to batch size and Top-K. (a) Performance degrades with larger batches, demonstrating that finer batch subdivision better captures localized priors and prevents over-smoothing of the global distribution. (b) Minimal sensitivity to $K$ validates that our model avoids overfitting $p(Y|z,X)$ to compensate for collapsed $p(z|X)$—the learned prior remains informative across sampling budgets, confirming effective prior learning without mode collapse. Results on ETH dataset.

Theorems & Definitions (10)

  • Theorem 3.1
  • proof
  • Proposition 3.2
  • proof
  • Remark 3.3
  • Corollary 3.4
  • proof
  • proof
  • proof
  • proof