Table of Contents
Fetching ...

Solving Imperfect-Recall Games via Sum-of-Squares Optimization

Rui Zheng, Ryann Sim, Antonios Varvitsiotis

TL;DR

This work proposes sum-of-squares (SOS) hierarchies for computing ex-ante optimal strategies in single-player IREFGs and Nash equilibria in multi-player IREFGs, and shows that in the single-player setting the SOS hierarchy converges at the first level, enabling equilibrium computation with a single semidefinite program (SDP).

Abstract

Extensive-form games (EFGs) provide a powerful framework for modeling sequential decision making, capturing strategic interaction under imperfect information, chance events, and temporal structure. Most positive algorithmic and theoretical results for EFGs assume perfect recall, where players remember all past information and actions. We study the increasingly relevant setting of imperfect-recall EFGs (IREFGs), where players may forget parts of their history or previously acquired information, and where equilibrium computation is provably hard. We propose sum-of-squares (SOS) hierarchies for computing ex-ante optimal strategies in single-player IREFGs and Nash equilibria in multi-player IREFGs, working over behavioral strategies. Our theoretical results show that (i) these hierarchies converge asymptotically, (ii) under genericity assumptions, the convergence is finite, and (iii) in single-player non-absentminded IREFGs, convergence occurs at a finite level determined by the number of information sets. Finally, we introduce the new classes of (SOS)-concave and (SOS)-monotone IREFGs, and show that in the single-player setting the SOS hierarchy converges at the first level, enabling equilibrium computation with a single semidefinite program (SDP).

Solving Imperfect-Recall Games via Sum-of-Squares Optimization

TL;DR

This work proposes sum-of-squares (SOS) hierarchies for computing ex-ante optimal strategies in single-player IREFGs and Nash equilibria in multi-player IREFGs, and shows that in the single-player setting the SOS hierarchy converges at the first level, enabling equilibrium computation with a single semidefinite program (SDP).

Abstract

Extensive-form games (EFGs) provide a powerful framework for modeling sequential decision making, capturing strategic interaction under imperfect information, chance events, and temporal structure. Most positive algorithmic and theoretical results for EFGs assume perfect recall, where players remember all past information and actions. We study the increasingly relevant setting of imperfect-recall EFGs (IREFGs), where players may forget parts of their history or previously acquired information, and where equilibrium computation is provably hard. We propose sum-of-squares (SOS) hierarchies for computing ex-ante optimal strategies in single-player IREFGs and Nash equilibria in multi-player IREFGs, working over behavioral strategies. Our theoretical results show that (i) these hierarchies converge asymptotically, (ii) under genericity assumptions, the convergence is finite, and (iii) in single-player non-absentminded IREFGs, convergence occurs at a finite level determined by the number of information sets. Finally, we introduce the new classes of (SOS)-concave and (SOS)-monotone IREFGs, and show that in the single-player setting the SOS hierarchy converges at the first level, enabling equilibrium computation with a single semidefinite program (SDP).
Paper Structure (53 sections, 18 theorems, 89 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 53 sections, 18 theorems, 89 equations, 2 figures, 2 tables, 1 algorithm.

Key Result

Theorem 3.1

In any IREFG, the expected utility of a player $i \in \mathcal{N}$ under a joint behavioral strategy $\mu$ is a polynomial in the entries of $\mu$. As a consequence: These reductions can be carried out in polynomial time.

Figures (2)

  • Figure 1: (a) A two-player zero-sum IREFG with no NE; (b) the single-player absentminded taxi driver IREFG. Dotted lines denote infosets. In (b), P1 cannot distinguish between nodes in the same history, so behavioral strategies (e.g., Left w.p. $x$, Right w.p. $1-x$) yield expected utility $u(x)=x^2+4x(1-x)$, a polynomial in the behavioral strategy space.
  • Figure 2: Constructed Single-Player Imperfect-Recall Game for $p(x)= 2 + 3x_{11}x_{21} - 5x_{12}x_{22} + 4x_{21}^2$.

Theorems & Definitions (32)

  • Theorem 3.1: Folklore
  • Proposition 3.1
  • Theorem 4.1
  • Theorem 5.1
  • Proposition 6.1
  • Corollary 6.2
  • Theorem 6.3
  • Definition C.1: Realization Probability
  • Definition C.2: Expected Utility for Player $i$
  • Theorem C.3
  • ...and 22 more