Table of Contents
Fetching ...

Understanding Optimal Portfolios of Strategies for Solving Two-player Zero-sum Games

Karolina Drabent, Ondřej Kubíček, Viliam Lisý

TL;DR

This work addresses the challenge of constructing optimal portfolios of strategies in two-player zero-sum games by proving the problem is NP-hard and showing that common heuristics can perform poorly. It introduces an ε-dominance framework and corresponding MILP formulations (ε-Dom-MILP and ε-Dom-Mixed-MILP) to bound and compute portfolios with provable exploitability guarantees, including both pure and mixed portfolios. Through extensive experiments on random and benchmark games, the authors demonstrate that mixed portfolios and global optimization over portfolios often outperform intuitive heuristics and competitive baselines, with strong performance on real game domains. The paper provides a principled, practical toolkit for portfolio construction and evaluation that lays the groundwork for scalable, robust methods in large-scale imperfect-information games.

Abstract

In large-scale games, approximating the opponent's strategy space with a small portfolio of representative strategies is a common and powerful technique. However, the construction of these portfolios often relies on domain-specific knowledge or heuristics with no theoretical guarantees. This paper establishes a formal foundation for portfolio-based strategy approximation. We define the problem of finding an optimal portfolio in two-player zero-sum games and prove that this optimization problem is NP-hard. We demonstrate that several intuitive heuristics-such as using the support of a Nash Equilibrium or building portfolios incrementally - can lead to highly suboptimal solutions. These negative results underscore the problem's difficulty and motivate the need for robust, empirically-validated heuristics. To this end, we introduce an analytical framework to bound portfolio quality and propose a methodology for evaluating heuristic approaches. Our evaluation of several heuristics shows that their success heavily depends on the specific game being solved. Our code is publicly available.

Understanding Optimal Portfolios of Strategies for Solving Two-player Zero-sum Games

TL;DR

This work addresses the challenge of constructing optimal portfolios of strategies in two-player zero-sum games by proving the problem is NP-hard and showing that common heuristics can perform poorly. It introduces an ε-dominance framework and corresponding MILP formulations (ε-Dom-MILP and ε-Dom-Mixed-MILP) to bound and compute portfolios with provable exploitability guarantees, including both pure and mixed portfolios. Through extensive experiments on random and benchmark games, the authors demonstrate that mixed portfolios and global optimization over portfolios often outperform intuitive heuristics and competitive baselines, with strong performance on real game domains. The paper provides a principled, practical toolkit for portfolio construction and evaluation that lays the groundwork for scalable, robust methods in large-scale imperfect-information games.

Abstract

In large-scale games, approximating the opponent's strategy space with a small portfolio of representative strategies is a common and powerful technique. However, the construction of these portfolios often relies on domain-specific knowledge or heuristics with no theoretical guarantees. This paper establishes a formal foundation for portfolio-based strategy approximation. We define the problem of finding an optimal portfolio in two-player zero-sum games and prove that this optimization problem is NP-hard. We demonstrate that several intuitive heuristics-such as using the support of a Nash Equilibrium or building portfolios incrementally - can lead to highly suboptimal solutions. These negative results underscore the problem's difficulty and motivate the need for robust, empirically-validated heuristics. To this end, we introduce an analytical framework to bound portfolio quality and propose a methodology for evaluating heuristic approaches. Our evaluation of several heuristics shows that their success heavily depends on the specific game being solved. Our code is publicly available.

Paper Structure

This paper contains 34 sections, 10 theorems, 14 equations, 5 figures, 3 tables.

Key Result

Theorem 5

Let $n$ be the number of pure strategies of Player 2 and $k<n$ the desired size of a portfolio. Deciding whether there is a portfolio of size $k$ with an exploitability lower than $\frac{1}{2n}$ is an NP hard problem. Consequently, finding the optimal portfolio of a given size is also NP hard.

Figures (5)

  • Figure 1: This diagram illustrates the strategy optimization process using a portfolio $P=\{\pi_{p1},\pi_{p2},\pi_{p3} \}$ in a game $G=(A_1, A_2,u)$. The strategy space is a multidimensional simplex mapped to two dimensions for simplicity. On the left, the simplex represents the strategy space of Player 1, $\Delta(A_1)$, while on the right, of Player 2, $\Delta(A_2)$. Within the latter, the gray simplex represents the strategy space defined by the portfolio $\Delta(P)$. The NE of the restricted game $G(A_1, P)$, $\pi, \pi \in f_{NE}(G(A_1, P))$, is computed for $\Delta(A_1)$ and $\Delta(P)$. Then, utility of portfolio P, $u_{f_{NE}}(G,P)$ is computed by fixing Player 1's strategy and finding best response of Player 2 in the full strategy space $\Delta(A_2)$.
  • Figure 2: Experiment showing relation of pessimistic exploitability $ex_{PES}$ and size of the portfolio with different $\epsilon$ values. Portfolios were found by $\epsilon$-Dom-MILP , with a changed objective to minimize its portfolio size for the given $\epsilon$ bound. Performed on $N=100$ random games, the mean taken and the standard error are shown by the shaded area.
  • Figure 3: Comparison of $\epsilon$-Dom-MILP , $\epsilon$-Dom-Mixed-MILP and Greedy-K on games of action sizes $|A_1|=|A_2|=25$. The experiment was run on $100$ different random games, the mean and the standard error are shown by the shaded area.
  • Figure 4: RM+ Exploitability of different methods on extensive form games. Online algorithms are marked with a dashed line. For methods that depend on stochasticity, results for 50 different seeds were computed. The average and standard error is shown. GCT method uses a scaling factor of 0.3 and gradients from 10k iterations.
  • Figure 5: Additional game sizes. Experiments on 100 random games.

Theorems & Definitions (22)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Theorem 10
  • Theorem 11
  • Theorem 12
  • ...and 12 more