Table of Contents
Fetching ...

Leveraging Team Correlation for Approximating Equilibrium in Two-Team Zero-Sum Games

Naming Liu, Mingzhi Wang, Youzhi Zhang, Yaodong Yang, Bo An, Ying Wen

TL;DR

This work addresses the challenge of finding unexploitable equilibria in large two-team zero-sum games. It formalizes Correlated-Team Maxmin Equilibrium ($CTME$) as the optimal unexploitable policy under joint coordination, but recognizes that exact computation is intractable due to exponential joint policy spaces; to tackle this, it introduces restricted CTME ($rCTME$) with a configurable sample factor and a sequential correlation mechanism. The authors propose Sequential PSRO ($S$-PSRO), a practical algorithm that approximates $rCTME$ by constructing sequentially correlated team policies via a Sequential Best Response ($SeBR$) and a sequential communication protocol, allowing scalable learning across increasingly large games. Empirically, $S$-PSRO achieves lower exploitability and superior performance compared with baselines in Normal Form games (SAD), MAgent Battle, and Google Research Football, demonstrating its effectiveness in diverse, large-scale multi-agent settings. These contributions offer a scalable framework for robust strategic decision-making in adversarial multi-agent environments with potential applications in complex online games and security domains.

Abstract

Two-team zero-sum games are one of the most important paradigms in game theory. In this paper, we focus on finding an unexploitable equilibrium in large team games. An unexploitable equilibrium is a worst-case policy, where members in the opponent team cannot increase their team reward by taking any policy, e.g., cooperatively changing to other joint policies. As an optimal unexploitable equilibrium in two-team zero-sum games, correlated-team maxmin equilibrium remains unexploitable even in the worst case where players in the opponent team can achieve arbitrary cooperation through a joint team policy. However, finding such an equilibrium in large games is challenging due to the impracticality of evaluating the exponentially large number of joint policies. To solve this problem, we first introduce a general solution concept called restricted correlated-team maxmin equilibrium, which solves the problem of being impossible to evaluate all joint policy by a sample factor while avoiding an exploitation problem under the incomplete joint policy evaluation. We then develop an efficient sequential correlation mechanism, and based on which we propose an algorithm for approximating the unexploitable equilibrium in large games. We show that our approach achieves lower exploitability than the state-of-the-art baseline when encountering opponent teams with different exploitation ability in large team games including Google Research Football.

Leveraging Team Correlation for Approximating Equilibrium in Two-Team Zero-Sum Games

TL;DR

This work addresses the challenge of finding unexploitable equilibria in large two-team zero-sum games. It formalizes Correlated-Team Maxmin Equilibrium () as the optimal unexploitable policy under joint coordination, but recognizes that exact computation is intractable due to exponential joint policy spaces; to tackle this, it introduces restricted CTME () with a configurable sample factor and a sequential correlation mechanism. The authors propose Sequential PSRO (-PSRO), a practical algorithm that approximates by constructing sequentially correlated team policies via a Sequential Best Response () and a sequential communication protocol, allowing scalable learning across increasingly large games. Empirically, -PSRO achieves lower exploitability and superior performance compared with baselines in Normal Form games (SAD), MAgent Battle, and Google Research Football, demonstrating its effectiveness in diverse, large-scale multi-agent settings. These contributions offer a scalable framework for robust strategic decision-making in adversarial multi-agent environments with potential applications in complex online games and security domains.

Abstract

Two-team zero-sum games are one of the most important paradigms in game theory. In this paper, we focus on finding an unexploitable equilibrium in large team games. An unexploitable equilibrium is a worst-case policy, where members in the opponent team cannot increase their team reward by taking any policy, e.g., cooperatively changing to other joint policies. As an optimal unexploitable equilibrium in two-team zero-sum games, correlated-team maxmin equilibrium remains unexploitable even in the worst case where players in the opponent team can achieve arbitrary cooperation through a joint team policy. However, finding such an equilibrium in large games is challenging due to the impracticality of evaluating the exponentially large number of joint policies. To solve this problem, we first introduce a general solution concept called restricted correlated-team maxmin equilibrium, which solves the problem of being impossible to evaluate all joint policy by a sample factor while avoiding an exploitation problem under the incomplete joint policy evaluation. We then develop an efficient sequential correlation mechanism, and based on which we propose an algorithm for approximating the unexploitable equilibrium in large games. We show that our approach achieves lower exploitability than the state-of-the-art baseline when encountering opponent teams with different exploitation ability in large team games including Google Research Football.
Paper Structure (23 sections, 14 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 23 sections, 14 equations, 6 figures, 2 tables, 2 algorithms.

Figures (6)

  • Figure 1: Mechanism of Sequential Correlation, under which the team policy search process is organized as a sequential search tree. The sequential information sharing is implemented through a communication channel.
  • Figure 2: Deviation Policy Space Comparison of rCTME under Different Sequential Correlation Mechanisms, where No Correlation and Pivot-followers Correlation are two examples of Sequential Correlation. In the scenario depicted above, both teams have two players, referred as member 1 and member 2. The horizontal and vertical axes represent actions (policies) of the two players, the squares in the two-dimensional space indicating the team joint actions (policies), and the direction of the arrows indicates an increase in the joint action (policy) reward. Due to the symmetrical setup of the two teams, we only illustrate the deviation policy space of one team.
  • Figure 3: Performance of S-PSRO, Team-PSRO and Indep-PSRO in Google Research Football, where S-PSRO surpasses the baselines.
  • Figure 4: Exploitability in SAD games is calculated.
  • Figure 5: Cooperative Behaviours of S-PSRO for Passing in Google Research Football
  • ...and 1 more figures