Expandable Decision-Making States for Multi-Agent Deep Reinforcement Learning in Soccer Tactical Analysis
Kenjiro Ide, Taiga Someya, Kohei Kawaguchi, Keisuke Fujii
TL;DR
This work tackles the challenge of analyzing tactical play in soccer with interpretable, agent‑level decisions in a high‑dimensional, multi‑agent setting. It introduces Expandable Decision‑Making States (EDMS), a semantically rich state representation with an action masking scheme that differentiates on‑ball and off‑ball decision spaces, and extends rewards with EPV signals to better propagate goal probabilities ($L_{td}$, $L_{as}$, $L_{L1}$ are minimized under $ abla=1$). Empirical results show that EDMS with masking reduces action‑prediction loss and TD error, while qualitative analyses reveal Tactical patterns such as fast counters and defensive breakthroughs, with robust cross‑dataset performance via the OpenSTARLab RLearn library. The approach enables cross‑provider benchmarking and reproducible evaluation, offering a practical path toward data‑driven tactical insight and coaching support in real matches.
Abstract
Invasion team sports such as soccer produce a high-dimensional, strongly coupled state space as many players continuously interact on a shared field, challenging quantitative tactical analysis. Traditional rule-based analyses are intuitive, while modern predictive machine learning models often perform pattern-matching without explicit agent representations. The problem we address is how to build player-level agent models from data, whose learned values and policies are both tactically interpretable and robust across heterogeneous data sources. Here, we propose Expandable Decision-Making States (EDMS), a semantically enriched state representation that augments raw positions and velocities with relational variables (e.g., scoring of space, pass, and score), combined with an action-masking scheme that gives on-ball and off-ball agents distinct decision sets. Compared to prior work, EDMS maps learned value functions and action policies to human-interpretable tactical concepts (e.g., marking pressure, passing lanes, ball accessibility) instead of raw coordinate features, and aligns agent choices with the rules of play. In the experiments, EDMS with action masking consistently reduced both action-prediction loss and temporal-difference (TD) error compared to the baseline. Qualitative case studies and Q-value visualizations further indicate that EDMS highlights high-risk, high-reward tactical patterns (e.g., fast counterattacks and defensive breakthroughs). We also integrated our approach into an open-source library and demonstrated compatibility with multiple commercial and open datasets, enabling cross-provider evaluation and reproducible experiments.
