Table of Contents
Fetching ...

Computing Ex Ante Equilibrium in Heterogeneous Zero-Sum Team Games

Naming Liu, Mingzhi Wang, Xihuai Wang, Weinan Zhang, Yaodong Yang, Youzhi Zhang, Bo An, Ying Wen

TL;DR

This work tackles the challenge of computing ex ante equilibria in heterogeneous two-team zero-sum games. It identifies critical expressiveness and convergence limitations in homogeneous PSRO approaches and proves that representing heterogeneous teammate policies enables a global TMECor. The authors introduce H-PSRO, which embeds a sequential correlation mechanism to efficiently search the enlarged policy space without exponential cost, and they establish theoretical guarantees of improved coordination and lower exploitability. Through extensive experiments on matrix games, Competitive StarCraft, and Google Research Football, H-PSRO demonstrates superior convergence, robustness, and performance against state-of-the-art baselines. This framework offers a scalable path to robust ex ante coordination in diverse, real-world multi-agent settings.

Abstract

The ex ante equilibrium for two-team zero-sum games, where agents within each team collaborate to compete against the opposing team, is known to be the best a team can do for coordination. Many existing works on ex ante equilibrium solutions are aiming to extend the scope of ex ante equilibrium solving to large-scale team games based on Policy Space Response Oracle (PSRO). However, the joint team policy space constructed by the most prominent method, Team PSRO, cannot cover the entire team policy space in heterogeneous team games where teammates play distinct roles. Such insufficient policy expressiveness causes Team PSRO to be trapped into a sub-optimal ex ante equilibrium with significantly higher exploitability and never converges to the global ex ante equilibrium. To find the global ex ante equilibrium without introducing additional computational complexity, we first parameterize heterogeneous policies for teammates, and we prove that optimizing the heterogeneous teammates' policies sequentially can guarantee a monotonic improvement in team rewards. We further propose Heterogeneous-PSRO (H-PSRO), a novel framework for heterogeneous team games, which integrates the sequential correlation mechanism into the PSRO framework and serves as the first PSRO framework for heterogeneous team games. We prove that H-PSRO achieves lower exploitability than Team PSRO in heterogeneous team games. Empirically, H-PSRO achieves convergence in matrix heterogeneous games that are unsolvable by non-heterogeneous baselines. Further experiments reveal that H-PSRO outperforms non-heterogeneous baselines in both heterogeneous team games and homogeneous settings.

Computing Ex Ante Equilibrium in Heterogeneous Zero-Sum Team Games

TL;DR

This work tackles the challenge of computing ex ante equilibria in heterogeneous two-team zero-sum games. It identifies critical expressiveness and convergence limitations in homogeneous PSRO approaches and proves that representing heterogeneous teammate policies enables a global TMECor. The authors introduce H-PSRO, which embeds a sequential correlation mechanism to efficiently search the enlarged policy space without exponential cost, and they establish theoretical guarantees of improved coordination and lower exploitability. Through extensive experiments on matrix games, Competitive StarCraft, and Google Research Football, H-PSRO demonstrates superior convergence, robustness, and performance against state-of-the-art baselines. This framework offers a scalable path to robust ex ante coordination in diverse, real-world multi-agent settings.

Abstract

The ex ante equilibrium for two-team zero-sum games, where agents within each team collaborate to compete against the opposing team, is known to be the best a team can do for coordination. Many existing works on ex ante equilibrium solutions are aiming to extend the scope of ex ante equilibrium solving to large-scale team games based on Policy Space Response Oracle (PSRO). However, the joint team policy space constructed by the most prominent method, Team PSRO, cannot cover the entire team policy space in heterogeneous team games where teammates play distinct roles. Such insufficient policy expressiveness causes Team PSRO to be trapped into a sub-optimal ex ante equilibrium with significantly higher exploitability and never converges to the global ex ante equilibrium. To find the global ex ante equilibrium without introducing additional computational complexity, we first parameterize heterogeneous policies for teammates, and we prove that optimizing the heterogeneous teammates' policies sequentially can guarantee a monotonic improvement in team rewards. We further propose Heterogeneous-PSRO (H-PSRO), a novel framework for heterogeneous team games, which integrates the sequential correlation mechanism into the PSRO framework and serves as the first PSRO framework for heterogeneous team games. We prove that H-PSRO achieves lower exploitability than Team PSRO in heterogeneous team games. Empirically, H-PSRO achieves convergence in matrix heterogeneous games that are unsolvable by non-heterogeneous baselines. Further experiments reveal that H-PSRO outperforms non-heterogeneous baselines in both heterogeneous team games and homogeneous settings.
Paper Structure (29 sections, 18 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 29 sections, 18 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: Procedure of the homogeneous PSRO framework in Team Rock-Paper-Scissors, which is a typical heterogeneous team game, with four agents, two teams $T_{{\color{team} \boldsymbol{1}}} = \{{\color{team} \boldsymbol{\text{M}_{1}}}, {\color{team} \boldsymbol{\text{M}_{2}}}\}$ and $T_{{\color{opponent} \boldsymbol{2}}} = \{{\color{opponent} \boldsymbol{\text{O}_{1}}}, {\color{opponent} \boldsymbol{\text{O}_{2}}}\}$, one state, and joint action spaces $\boldsymbol{\mathcal{A}}_{{\color{team} \boldsymbol{1}}} = \boldsymbol{\mathcal{A}}_{{\color{opponent} \boldsymbol{2}}} = \{a, b\}^2$. Agents play Rock-Paper-Scissors between the teams: if player ${\color{team} \boldsymbol{\text{M}_{1}}}$ in team $T_{{\color{team} \boldsymbol{1}}}$ (or ${\color{opponent} \boldsymbol{\text{O}_{1}}}$ in team $T_{{\color{opponent} \boldsymbol{2}}}$) chooses action $b$, then the team plays $Scissors$ no matter the choice of the other player in the team; if both players choose action $a$, then the team plays $Rock$; otherwise, the team plays $Paper$. The two players in team $T_{{\color{team} \boldsymbol{1}}}$ or opponent team $T_{{\color{opponent} \boldsymbol{2}}}$ are heterogeneous because the actions $a$ and $b$ serve different functions for them. Specifically, player ${\color{team} \boldsymbol{\text{M}_{1}}}$ (or ${\color{opponent} \boldsymbol{\text{O}_{1}}}$) can unilaterally choose the team decision $Scissors$ by playing action $b$, while player ${\color{team} \boldsymbol{\text{M}_{2}}}$ (or ${\color{opponent} \boldsymbol{\text{O}_{2}}}$) must coordinate with the other player to choose $Paper$ by playing action $b$.
  • Figure 2: Trajectories of SP, FSP, Team PSRO and H-PSRO in Team RPS game (Example 1). H-PSRO shows superior convergence to the global TMECor due to the sufficient policy expressive ability of heterogeneous policies and the corresponding full equilibrium expressiveness under the heterogeneous PSRO framework (see Theorem 1).
  • Figure 3: Performance of H-PSRO and Team PSRO in a typical Matrix Heterogeneous Team Game.
  • Figure 4: Exploitability of H-PSRO and Team PSRO in the Competitive StarCraft Benchmark.
  • Figure 5: Performance of H-PSRO, Team PSRO and Indep-PSRO in Google Research Football.

Theorems & Definitions (8)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof