Solving Football by Exploiting Equilibrium Structure of 2p0s Differential Games with One-Sided Information
Mukesh Ghimire, Lei Zhang, Zhe Xu, Yi Ren
TL;DR
This work tackles the scalability challenge of two-player imperfect-information differential games by focusing on 2p0s1, where the informed player knows the payoff while the uninformed player holds a belief over payoff types. The authors prove that, under mild conditions, equilibrium strategies are $I$-atomic for P1 and $(I+1)$-atomic for P2, collapsing the game-tree complexity to at most $I^K$ or $(I+1)^K$ and enabling efficient solution when $I\ll U$. They introduce primal-dual reformulations $G_\tau$ and $G^*_{\tau}$ whose equilibria are atomic and converge to the NE of the original game as $\tau\to 0^+$, and develop CAMS (continuous-action minimax solvers) coupled with multigrid value approximation to solve the resulting minimax problems. The framework is demonstrated on Hexner’s game and a 22-player American football setting, showing substantial improvements in learning accuracy and computational efficiency over state-of-the-art IIEFG solvers; in football, the approach reveals practical deception windows and football-like tactics within feasible compute on consumer hardware. Overall, the paper provides a theoretically grounded, scalable path to solving continuous-action, one-sided-information differential games with broad applications in sports, defense, cybersecurity, and finance, leveraging atomic NE structure in conjunction with MARL and MPC.
Abstract
For a two-player imperfect-information extensive-form game (IIEFG) with $K$ time steps and a player action space of size $U$, the game tree complexity is $U^{2K}$, causing existing IIEFG solvers to struggle with large or infinite $(U,K)$, e.g., differential games with continuous action spaces. To partially address this scalability challenge, we focus on an important class of 2p0s games where the informed player (P1) knows the payoff while the uninformed player (P2) only has a belief over the set of $I$ possible payoffs. Such games encompass a wide range of scenarios in sports, defense, cybersecurity, and finance. We prove that under mild conditions, P1's (resp. P2's) equilibrium strategy at any infostate concentrates on at most $I$ (resp. $I+1$) action prototypes. When $I\ll U$, this equilibrium structure causes the game tree complexity to collapse to $I^K$ for P1 when P2 plays pure best responses, and $(I+1)^K$ for P2 in a dual game where P1 plays pure best responses. We then show that exploiting this structure in standard learning modes, i.e., model-free multiagent reinforcement learning and model predictive control, is straightforward, leading to significant improvements in learning accuracy and efficiency from SOTA IIEFG solvers. Our demonstration solves a 22-player football game ($K=10$, $U=\infty$) where the attacking team has to strategically conceal their intention until a critical moment in order to exploit information advantage. Code is available at https://github.com/ghimiremukesh/cams/tree/iclr
