PlayBest: Professional Basketball Player Behavior Synthesis via Planning with Diffusion
Xiusi Chen, Wei-Yao Wang, Ziniu Hu, David Reynoso, Kun Jin, Mingyan Liu, P. Jeffrey Brantingham, Wei Wang
TL;DR
PlayBest tackles the challenge of planning in dynamic basketball environments by extending diffusion probabilistic models to learn environmental dynamics from NBA motion-tracking data and by guiding trajectory generation with a learned value function. The framework integrates a diffusion-based environmental model with a reward predictor and employs classifier-guided (gradient-based) sampling to steer multi-agent trajectories toward high-reward plays, enabling offline planning without online simulation. Empirical results on NBA data show PlayBest outperforms offline RL baselines and aligns with professional tactics, validating its capacity to capture complex spatio-temporal game dynamics and generate strategic plays. The work advances real-time, data-driven sport strategy synthesis and paves the way for broader applications in other competitive, dynamic systems.
Abstract
Dynamically planning in complex systems has been explored to improve decision-making in various domains. Professional basketball serves as a compelling example of a dynamic spatio-temporal game, encompassing context-dependent decision-making. However, processing the diverse on-court signals and navigating the vast space of potential actions and outcomes make it difficult for existing approaches to swiftly identify optimal strategies in response to evolving circumstances. In this study, we formulate the sequential decision-making process as a conditional trajectory generation process. Based on the formulation, we introduce PlayBest (PLAYer BEhavior SynThesis), a method to improve player decision-making. We extend the diffusion probabilistic model to learn challenging environmental dynamics from historical National Basketball Association (NBA) player motion tracking data. To incorporate data-driven strategies, an auxiliary value function is trained with corresponding rewards. To accomplish reward-guided trajectory generation, we condition the diffusion model on the value function via classifier-guided sampling. We validate the effectiveness of PlayBest through simulation studies, contrasting the generated trajectories with those employed by professional basketball teams. Our results reveal that the model excels at generating reasonable basketball trajectories that produce efficient plays. Moreover, the synthesized play strategies exhibit an alignment with professional tactics, highlighting the model's capacity to capture the intricate dynamics of basketball games.
