Table of Contents
Fetching ...

PlayGen-MoG: Framework for Diverse Multi-Agent Play Generation via Mixture-of-Gaussians Trajectory Prediction

Kevin Song

Abstract

Multi-agent trajectory generation in team sports requires models that capture both the diversity of possible plays and realistic spatial coordination between players on plays. Standard generative approaches such as Conditional Variational Autoencoders (CVAE) and diffusion models struggle with this task, exhibiting posterior collapse or convergence to the dataset mean. Moreover, most trajectory prediction methods operate in a forecasting regime that requires multiple frames of observed history, limiting their use for play design where only the initial formation is available. We present PlayGen-MoG, an extensible framework for formation-conditioned play generation that addresses these challenges through three design choices: 1/ a Mixture-of-Gaussians (MoG) output head with shared mixture weights across all agents, where a single set of weights selects a play scenario that couples all players' trajectories, 2/ relative spatial attention that encodes pairwise player positions and distances as learned attention biases, and 3/ non-autoregressive prediction of absolute displacements from the initial formation, eliminating cumulative error drift and removing the dependence on observed trajectory history, enabling realistic play generation from a single static formation alone. On American football tracking data, PlayGen-MoG achieves 1.68 yard ADE and 3.98 yard FDE while maintaining full utilization of all 8 mixture components with entropy of 2.06 out of 2.08, and qualitatively confirming diverse generation without mode collapse.

PlayGen-MoG: Framework for Diverse Multi-Agent Play Generation via Mixture-of-Gaussians Trajectory Prediction

Abstract

Multi-agent trajectory generation in team sports requires models that capture both the diversity of possible plays and realistic spatial coordination between players on plays. Standard generative approaches such as Conditional Variational Autoencoders (CVAE) and diffusion models struggle with this task, exhibiting posterior collapse or convergence to the dataset mean. Moreover, most trajectory prediction methods operate in a forecasting regime that requires multiple frames of observed history, limiting their use for play design where only the initial formation is available. We present PlayGen-MoG, an extensible framework for formation-conditioned play generation that addresses these challenges through three design choices: 1/ a Mixture-of-Gaussians (MoG) output head with shared mixture weights across all agents, where a single set of weights selects a play scenario that couples all players' trajectories, 2/ relative spatial attention that encodes pairwise player positions and distances as learned attention biases, and 3/ non-autoregressive prediction of absolute displacements from the initial formation, eliminating cumulative error drift and removing the dependence on observed trajectory history, enabling realistic play generation from a single static formation alone. On American football tracking data, PlayGen-MoG achieves 1.68 yard ADE and 3.98 yard FDE while maintaining full utilization of all 8 mixture components with entropy of 2.06 out of 2.08, and qualitatively confirming diverse generation without mode collapse.

Paper Structure

This paper contains 39 sections, 10 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: PlayGen-MoG training and generation overview(A) Model architecture. Initial formation and role IDs are encoded by a full-attention formation encoder. The input projection maps formation (replicated across all $T{-}1$ frames) and sinusoidal step embeddings to hidden representations. A stack of $L$ SRTE blocks applies relative spatial attention with pairwise distance biases, followed by cross-attention to the formation embeddings (Q = agents, K/V = formation). Bidirectional temporal attention pools agent features per frame, applies full self-attention across frames, and broadcasts the temporal context back to each agent. The MoG output head produces shared mixture weights $\boldsymbol{\pi}$, per-agent displacement means $\boldsymbol{\mu}$, and Cholesky-parameterized covariances $\mathbf{L}$, which are used to generate the final trajectory. (B) Play Generation. The model performs a single forward pass with formation replicated across $T{-}1$ frames. A global mixture component $k$ is selected by sampling from the time-averaged weights $\bar{\boldsymbol{\pi}}$. Positions are reconstructed as absolute displacements from formation: $\mathbf{x}_i^{(t)} = \mathbf{f}_i + \boldsymbol{\mu}_{ik}^{(t)} + \mathbf{L}_{ik}^{(t)}\boldsymbol{\epsilon}$, eliminating cumulative drift.
  • Figure 1: Qualitative comparison of generative baselines. Each row shows three independent samples from the same formation. Top (CVAE): Posterior collapse---all samples are nearly identical despite different latent draws. Middle (LED): Diffusion produces high-variance, spatially incoherent trajectories spanning the full field. Bottom (PlayGen-MoG): Each sample represents a distinct, realistic play concept with coordinated player motion.
  • Figure 2: Formation-conditioned play generation at temperature 1.0 across three personnel groupings. Each row shows a different formation type: 3WR/1TE/1RB tight (top), 3WR/1TE/1RB spread (middle), and 2WR/2TE/1RB 12-personnel (bottom). The leftmost column shows ground truth; columns 2--4 show three independent samples from PlayGen-MoG. Players are colored by position group (see legend). All panels in each row share the same axis scale. The model generates diverse play outcomes that match the scale and structure of real American football plays.
  • Figure 2: A single generated play shown at increasing prediction horizons. Circles mark starting positions; diamonds mark endpoints at each horizon. Trajectory width tapers to indicate direction of movement. Routes become progressively distinguishable as the horizon extends.