Table of Contents
Fetching ...

PlayBest: Professional Basketball Player Behavior Synthesis via Planning with Diffusion

Xiusi Chen, Wei-Yao Wang, Ziniu Hu, David Reynoso, Kun Jin, Mingyan Liu, P. Jeffrey Brantingham, Wei Wang

TL;DR

PlayBest tackles the challenge of planning in dynamic basketball environments by extending diffusion probabilistic models to learn environmental dynamics from NBA motion-tracking data and by guiding trajectory generation with a learned value function. The framework integrates a diffusion-based environmental model with a reward predictor and employs classifier-guided (gradient-based) sampling to steer multi-agent trajectories toward high-reward plays, enabling offline planning without online simulation. Empirical results on NBA data show PlayBest outperforms offline RL baselines and aligns with professional tactics, validating its capacity to capture complex spatio-temporal game dynamics and generate strategic plays. The work advances real-time, data-driven sport strategy synthesis and paves the way for broader applications in other competitive, dynamic systems.

Abstract

Dynamically planning in complex systems has been explored to improve decision-making in various domains. Professional basketball serves as a compelling example of a dynamic spatio-temporal game, encompassing context-dependent decision-making. However, processing the diverse on-court signals and navigating the vast space of potential actions and outcomes make it difficult for existing approaches to swiftly identify optimal strategies in response to evolving circumstances. In this study, we formulate the sequential decision-making process as a conditional trajectory generation process. Based on the formulation, we introduce PlayBest (PLAYer BEhavior SynThesis), a method to improve player decision-making. We extend the diffusion probabilistic model to learn challenging environmental dynamics from historical National Basketball Association (NBA) player motion tracking data. To incorporate data-driven strategies, an auxiliary value function is trained with corresponding rewards. To accomplish reward-guided trajectory generation, we condition the diffusion model on the value function via classifier-guided sampling. We validate the effectiveness of PlayBest through simulation studies, contrasting the generated trajectories with those employed by professional basketball teams. Our results reveal that the model excels at generating reasonable basketball trajectories that produce efficient plays. Moreover, the synthesized play strategies exhibit an alignment with professional tactics, highlighting the model's capacity to capture the intricate dynamics of basketball games.

PlayBest: Professional Basketball Player Behavior Synthesis via Planning with Diffusion

TL;DR

PlayBest tackles the challenge of planning in dynamic basketball environments by extending diffusion probabilistic models to learn environmental dynamics from NBA motion-tracking data and by guiding trajectory generation with a learned value function. The framework integrates a diffusion-based environmental model with a reward predictor and employs classifier-guided (gradient-based) sampling to steer multi-agent trajectories toward high-reward plays, enabling offline planning without online simulation. Empirical results on NBA data show PlayBest outperforms offline RL baselines and aligns with professional tactics, validating its capacity to capture complex spatio-temporal game dynamics and generate strategic plays. The work advances real-time, data-driven sport strategy synthesis and paves the way for broader applications in other competitive, dynamic systems.

Abstract

Dynamically planning in complex systems has been explored to improve decision-making in various domains. Professional basketball serves as a compelling example of a dynamic spatio-temporal game, encompassing context-dependent decision-making. However, processing the diverse on-court signals and navigating the vast space of potential actions and outcomes make it difficult for existing approaches to swiftly identify optimal strategies in response to evolving circumstances. In this study, we formulate the sequential decision-making process as a conditional trajectory generation process. Based on the formulation, we introduce PlayBest (PLAYer BEhavior SynThesis), a method to improve player decision-making. We extend the diffusion probabilistic model to learn challenging environmental dynamics from historical National Basketball Association (NBA) player motion tracking data. To incorporate data-driven strategies, an auxiliary value function is trained with corresponding rewards. To accomplish reward-guided trajectory generation, we condition the diffusion model on the value function via classifier-guided sampling. We validate the effectiveness of PlayBest through simulation studies, contrasting the generated trajectories with those employed by professional basketball teams. Our results reveal that the model excels at generating reasonable basketball trajectories that produce efficient plays. Moreover, the synthesized play strategies exhibit an alignment with professional tactics, highlighting the model's capacity to capture the intricate dynamics of basketball games.
Paper Structure (21 sections, 4 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 4 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview framework of PlayBest. The overall pipeline can be split into four major components: Frame Labeling, Environmental Dynamics Learning, Value (Perturb) Function Training, and Trajectory Generation Guided by a Reward Function. The diffusion probabilistic model $\epsilon_\theta$ is trained to model the environmental dynamics. The reward predictor $\mathcal{J}_\phi$ is trained on the same trajectories as the diffusion model. During guided trajectory generation, our model takes both environmental dynamics and rewards as input, performs guided planning via conditional sampling, and generates the trajectories as the guided plan.
  • Figure 2: (a, b)The input and diffusion architecture.
  • Figure 3: (a, b, c): Sampled cases of possessions generated by PlayBest.PlayBest learns strategies deviating from existing data yet still aligning with subjective expectations for effective basketball play. The blue team is on offense and moves towards the right basket, while the black team is on defense. The ball is marked in orange. The player who scores for the blue team is highlighted in Red (no shot attempt in (b)). Diamonds($\blacklozenge$) are final positions of the players. More details are in Section \ref{['sec:case_study']}.
  • Figure 4: (a, b, c): Possessions generated by PlayBest with different $\alpha$.