Table of Contents
Fetching ...

Exponential Lower Bounds on the Double Oracle Algorithm in Zero-Sum Games

Brian Hu Zhang, Tuomas Sandholm

TL;DR

The paper analyzes the plain double oracle algorithm for two-player zero-sum games, focusing on worst-case convergence. It proves exponential lower bounds on the number of iterations required in both POSGs and EFGs under adversarial or non-deterministic tie-breaking, using constructions like the $2^k$-bigger-number and $n$-bigger-number games mapped to POSGs. The results demonstrate that even compact instances with small Nash-support can force exponential runtime in the iteration count, highlighting fundamental limitations of the method. The discussion situates these findings relative to fictitious play and $\alpha$-best-response dynamics, and outlines directions for achieving polynomial guarantees or robust variants in future work.

Abstract

The double oracle algorithm is a popular method of solving games, because it is able to reduce computing equilibria to computing a series of best responses. However, its theoretical properties are not well understood. In this paper, we provide exponential lower bounds on the performance of the double oracle algorithm in both partially-observable stochastic games (POSGs) and extensive-form games (EFGs). Our results depend on what is assumed about the tiebreaking scheme -- that is, which meta-Nash equilibrium or best response is chosen, in the event that there are multiple to pick from. In particular, for EFGs, our lower bounds require adversarial tiebreaking, whereas for POSGs, our lower bounds apply regardless of how ties are broken.

Exponential Lower Bounds on the Double Oracle Algorithm in Zero-Sum Games

TL;DR

The paper analyzes the plain double oracle algorithm for two-player zero-sum games, focusing on worst-case convergence. It proves exponential lower bounds on the number of iterations required in both POSGs and EFGs under adversarial or non-deterministic tie-breaking, using constructions like the -bigger-number and -bigger-number games mapped to POSGs. The results demonstrate that even compact instances with small Nash-support can force exponential runtime in the iteration count, highlighting fundamental limitations of the method. The discussion situates these findings relative to fictitious play and -best-response dynamics, and outlines directions for achieving polynomial guarantees or robust variants in future work.

Abstract

The double oracle algorithm is a popular method of solving games, because it is able to reduce computing equilibria to computing a series of best responses. However, its theoretical properties are not well understood. In this paper, we provide exponential lower bounds on the performance of the double oracle algorithm in both partially-observable stochastic games (POSGs) and extensive-form games (EFGs). Our results depend on what is assumed about the tiebreaking scheme -- that is, which meta-Nash equilibrium or best response is chosen, in the event that there are multiple to pick from. In particular, for EFGs, our lower bounds require adversarial tiebreaking, whereas for POSGs, our lower bounds apply regardless of how ties are broken.
Paper Structure (8 sections, 5 theorems, 4 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 8 sections, 5 theorems, 4 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.1

For every $k \ge 1$, there exists a zero-sum fully-observable stochastic game with $O(k)$ nodes in which, regardless of initialization, meta-Nash, or best responses, double oracle takes $2^{\Theta(k)}$ iterations to find an exact equilibrium.

Figures (5)

  • Figure 1: The $k$-bit guess-the-string game, here depicted for $k = 4$. The action spaces are $A_1 = A_2 = \{0, 1\}$. The start state is the leftmost state, labeled with $\to$. Terminal states are drawn as rectangles, and their rewards are written within them. Transitions are deterministic, and edges are labeled with the transitions that take them there.
  • Figure 2: The $2^k$-bigger-number game used in \ref{['th:posg']}, here depicted for $k = 4$. Observations are trivial: $|O| = 1$.
  • Figure 3: The $2^k$-weak bigger-number game used in \ref{['th:sg']}, here depicted for $k = 4$.
  • Figure 4: A depiction of the game used in \ref{['th:efg-nz']}. Observations are not shown: the only nontrivial observation each player makes is the randomly-selected index $i$. Not all actions and transitions are shown. If a terminal node contains only one reward, then that is the reward of both players.
  • Figure 5: A depiction of the game used in \ref{['th:efg']}, for $k = 3$. Edges to the start states are labeled with their starting probabilities (1/3).

Theorems & Definitions (9)

  • Theorem 3.1
  • Theorem 3.2
  • proof
  • Theorem 3.3
  • proof
  • Theorem 3.4
  • proof
  • Theorem 3.5
  • proof