Table of Contents
Fetching ...

Solving Football by Exploiting Equilibrium Structure of 2p0s Differential Games with One-Sided Information

Mukesh Ghimire, Lei Zhang, Zhe Xu, Yi Ren

TL;DR

This work tackles the scalability challenge of two-player imperfect-information differential games by focusing on 2p0s1, where the informed player knows the payoff while the uninformed player holds a belief over payoff types. The authors prove that, under mild conditions, equilibrium strategies are $I$-atomic for P1 and $(I+1)$-atomic for P2, collapsing the game-tree complexity to at most $I^K$ or $(I+1)^K$ and enabling efficient solution when $I\ll U$. They introduce primal-dual reformulations $G_\tau$ and $G^*_{\tau}$ whose equilibria are atomic and converge to the NE of the original game as $\tau\to 0^+$, and develop CAMS (continuous-action minimax solvers) coupled with multigrid value approximation to solve the resulting minimax problems. The framework is demonstrated on Hexner’s game and a 22-player American football setting, showing substantial improvements in learning accuracy and computational efficiency over state-of-the-art IIEFG solvers; in football, the approach reveals practical deception windows and football-like tactics within feasible compute on consumer hardware. Overall, the paper provides a theoretically grounded, scalable path to solving continuous-action, one-sided-information differential games with broad applications in sports, defense, cybersecurity, and finance, leveraging atomic NE structure in conjunction with MARL and MPC.

Abstract

For a two-player imperfect-information extensive-form game (IIEFG) with $K$ time steps and a player action space of size $U$, the game tree complexity is $U^{2K}$, causing existing IIEFG solvers to struggle with large or infinite $(U,K)$, e.g., differential games with continuous action spaces. To partially address this scalability challenge, we focus on an important class of 2p0s games where the informed player (P1) knows the payoff while the uninformed player (P2) only has a belief over the set of $I$ possible payoffs. Such games encompass a wide range of scenarios in sports, defense, cybersecurity, and finance. We prove that under mild conditions, P1's (resp. P2's) equilibrium strategy at any infostate concentrates on at most $I$ (resp. $I+1$) action prototypes. When $I\ll U$, this equilibrium structure causes the game tree complexity to collapse to $I^K$ for P1 when P2 plays pure best responses, and $(I+1)^K$ for P2 in a dual game where P1 plays pure best responses. We then show that exploiting this structure in standard learning modes, i.e., model-free multiagent reinforcement learning and model predictive control, is straightforward, leading to significant improvements in learning accuracy and efficiency from SOTA IIEFG solvers. Our demonstration solves a 22-player football game ($K=10$, $U=\infty$) where the attacking team has to strategically conceal their intention until a critical moment in order to exploit information advantage. Code is available at https://github.com/ghimiremukesh/cams/tree/iclr

Solving Football by Exploiting Equilibrium Structure of 2p0s Differential Games with One-Sided Information

TL;DR

This work tackles the scalability challenge of two-player imperfect-information differential games by focusing on 2p0s1, where the informed player knows the payoff while the uninformed player holds a belief over payoff types. The authors prove that, under mild conditions, equilibrium strategies are -atomic for P1 and -atomic for P2, collapsing the game-tree complexity to at most or and enabling efficient solution when . They introduce primal-dual reformulations and whose equilibria are atomic and converge to the NE of the original game as , and develop CAMS (continuous-action minimax solvers) coupled with multigrid value approximation to solve the resulting minimax problems. The framework is demonstrated on Hexner’s game and a 22-player American football setting, showing substantial improvements in learning accuracy and computational efficiency over state-of-the-art IIEFG solvers; in football, the approach reveals practical deception windows and football-like tactics within feasible compute on consumer hardware. Overall, the paper provides a theoretically grounded, scalable path to solving continuous-action, one-sided-information differential games with broad applications in sports, defense, cybersecurity, and finance, leveraging atomic NE structure in conjunction with MARL and MPC.

Abstract

For a two-player imperfect-information extensive-form game (IIEFG) with time steps and a player action space of size , the game tree complexity is , causing existing IIEFG solvers to struggle with large or infinite , e.g., differential games with continuous action spaces. To partially address this scalability challenge, we focus on an important class of 2p0s games where the informed player (P1) knows the payoff while the uninformed player (P2) only has a belief over the set of possible payoffs. Such games encompass a wide range of scenarios in sports, defense, cybersecurity, and finance. We prove that under mild conditions, P1's (resp. P2's) equilibrium strategy at any infostate concentrates on at most (resp. ) action prototypes. When , this equilibrium structure causes the game tree complexity to collapse to for P1 when P2 plays pure best responses, and for P2 in a dual game where P1 plays pure best responses. We then show that exploiting this structure in standard learning modes, i.e., model-free multiagent reinforcement learning and model predictive control, is straightforward, leading to significant improvements in learning accuracy and efficiency from SOTA IIEFG solvers. Our demonstration solves a 22-player football game (, ) where the attacking team has to strategically conceal their intention until a critical moment in order to exploit information advantage. Code is available at https://github.com/ghimiremukesh/cams/tree/iclr

Paper Structure

This paper contains 69 sections, 14 theorems, 98 equations, 10 figures, 2 tables, 1 algorithm.

Key Result

Theorem 4.1

The RHS of Eq. eq:primal-subdp can be reformulated as i.e., $\eta_{i,\tau}^\dagger$ concentrates on actions $\{u^k\}_{k=1}^I$ for $i \in [I]$. The RHS of Eq. eq:dual-subdp can be reformulated as

Figures (10)

  • Figure 1: (a) IIEFG with $U$ actions per player per infostate and $K$ time steps has a game-tree complexity of $U^{2K}$. For 2p0s1 with $I$ payoff types, deterministic dynamics, and Isaacs' condition, we show that the NE is $I$-atomic for P1 and $(I+1)$-atomic for P2, leading to a game-tree complexity of $I^K$ for P1 in the primal game where P2 plays best responses and $(I+1)^K$ for P2 in the dual game where P1 plays best responses. (b) American Football with 22 players and continuous action spaces ($U=\infty$) with $K=10$ time steps. P1 (red) attacks with two private game types ($I=2$): Running back (RB) power-runs through the space created by blockers, and quarterback (QB) throws the ball to the leading wide receiver (WR). See https://github.com/ghimiremukesh/cams/tree/iclr/README.md. (c) At NE, P1 conceals type until 0.5 sec., similar to the reported 1.0 sec. Due to significant tree size reduction, the game can be solved in 30 minutes.
  • Figure 2: Value convexification causes NE to be atomic.
  • Figure 3: (a-c) Comparisons b/w CAMS, JPSPG, CFR+, MMD, CFR-BR-Primal on 1-stage Hexner's game. (d) Comparison b/w CAMS, JPSPG, and DeepCFR on 4-stage Hexner's w/ similar compute.
  • Figure 4: (a) Hexner's game schematics: one goal is selected out of two possible goals: Goal-1 and Goal-2, and communicated to P1. (b-e) Sample trajectories for the primal game (b-c) where P1 plays Nash and P2 plays best response, and primal-dual game (d-e) where both players play Nash. Dotted lines are ground-truth Nash. Color shades indicate evolution of public belief (Pr[Goal is 1]). Filled Magenta circle represents the true goal. Initial position pairs are marked with the same markers.
  • Figure 5: Comparisons b/w CAMS-DRL and standard PG methods on (a) 1-stage and (b) 4-stage games.
  • ...and 5 more figures

Theorems & Definitions (27)

  • Theorem 4.1
  • Theorem 4.2
  • Theorem 5.1
  • proof
  • Lemma C.1: Value properties
  • proof
  • Lemma C.2: Quadratic contact
  • proof
  • Lemma C.3: 1‐Lipschitz property
  • proof
  • ...and 17 more