JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation
Guillem Capellera, Luis Ferraz, Antonio Rubio, Alexandre Alahi, Antonio Agudo
TL;DR
JointDiff tackles the gap between continuous trajectories and synchronous discrete events by introducing a joint diffusion framework for dynamic multi-agent scenes. It unifies the forward diffusion of continuous coordinates and discrete events, and learns a single reverse model with two heads, enabling controllable generation through weak-possessor-guidance and natural language text guidance via CrossGuid. The approach achieves state-of-the-art results on completion and controllable generation across basketball, football, and soccer datasets, and demonstrates superior scene-level coherence and consistency over absorbing-state baselines. This work advances interactive, controllable high-dimensional generation in sports analytics and multi-agent simulation, with potential extensions to sparse events and broader data modalities.
Abstract
Generative models often treat continuous data and discrete events as separate processes, creating a gap in modeling complex systems where they interact synchronously. To bridge this gap, we introduce JointDiff, a novel diffusion framework designed to unify these two processes by simultaneously generating continuous spatio-temporal data and synchronous discrete events. We demonstrate its efficacy in the sports domain by simultaneously modeling multi-agent trajectories and key possession events. This joint modeling is validated with non-controllable generation and two novel controllable generation scenarios: weak-possessor-guidance, which offers flexible semantic control over game dynamics through a simple list of intended ball possessors, and text-guidance, which enables fine-grained, language-driven generation. To enable the conditioning with these guidance signals, we introduce CrossGuid, an effective conditioning operation for multi-agent domains. We also share a new unified sports benchmark enhanced with textual descriptions for soccer and football datasets. JointDiff achieves state-of-the-art performance, demonstrating that joint modeling is crucial for building realistic and controllable generative models for interactive systems.
