Table of Contents
Fetching ...

Extensive-Form Game Solving via Blackwell Approachability on Treeplexes

Darshan Chakrabarti, Julien Grand-Clément, Christian Kroer

TL;DR

This work introduces the first framework for Blackwell approachability on the sequence-form polytope (treeplex) to develop regret-minimization methods for solving two-player zero-sum extensive-form games via self-play. By leveraging the conic hull $\mathsf{cone}(\mathcal{T})$ and hyperplane forcing, it shows how to obtain a regret minimizer over the treeplex with desirable properties, including stepsize-invariance. It then instantiates this framework with PTB+, Smooth PTB+, and AdaGradTB+, achieving $O(1/\sqrt{T})$ and $O(1/T)$ convergence rates under different stability and adaptivity regimes. Empirical results demonstrate that treeplex stepsize invariance alone does not explain CFR+'s practical success; rather, infoset-level invariance from CFR+ plays a crucial role, with PCFR+ delivering strong average-iterate performance across benchmarks. Overall, the paper provides a modular, scalable toolkit for Blackwell-based regret minimization in EFGs and clarifies the trade-offs between invariance properties and convergence in practice.

Abstract

In this paper, we introduce the first algorithmic framework for Blackwell approachability on the sequence-form polytope, the class of convex polytopes capturing the strategies of players in extensive-form games (EFGs). This leads to a new class of regret-minimization algorithms that are stepsize-invariant, in the same sense as the Regret Matching and Regret Matching$^+$ algorithms for the simplex. Our modular framework can be combined with any existing regret minimizer over cones to compute a Nash equilibrium in two-player zero-sum EFGs with perfect recall, through the self-play framework. Leveraging predictive online mirror descent, we introduce Predictive Treeplex Blackwell$^+$ (PTB$^+$), and show a $O(1/\sqrt{T})$ convergence rate to Nash equilibrium in self-play. We then show how to stabilize PTB$^+$ with a stepsize, resulting in an algorithm with a state-of-the-art $O(1/T)$ convergence rate. We provide an extensive set of experiments to compare our framework with several algorithmic benchmarks, including CFR$^+$ and its predictive variant, and we highlight interesting connections between practical performance and the stepsize-dependence or stepsize-invariance properties of classical algorithms.

Extensive-Form Game Solving via Blackwell Approachability on Treeplexes

TL;DR

This work introduces the first framework for Blackwell approachability on the sequence-form polytope (treeplex) to develop regret-minimization methods for solving two-player zero-sum extensive-form games via self-play. By leveraging the conic hull and hyperplane forcing, it shows how to obtain a regret minimizer over the treeplex with desirable properties, including stepsize-invariance. It then instantiates this framework with PTB+, Smooth PTB+, and AdaGradTB+, achieving and convergence rates under different stability and adaptivity regimes. Empirical results demonstrate that treeplex stepsize invariance alone does not explain CFR+'s practical success; rather, infoset-level invariance from CFR+ plays a crucial role, with PCFR+ delivering strong average-iterate performance across benchmarks. Overall, the paper provides a modular, scalable toolkit for Blackwell-based regret minimization in EFGs and clarifies the trade-offs between invariance properties and convergence in practice.

Abstract

In this paper, we introduce the first algorithmic framework for Blackwell approachability on the sequence-form polytope, the class of convex polytopes capturing the strategies of players in extensive-form games (EFGs). This leads to a new class of regret-minimization algorithms that are stepsize-invariant, in the same sense as the Regret Matching and Regret Matching algorithms for the simplex. Our modular framework can be combined with any existing regret minimizer over cones to compute a Nash equilibrium in two-player zero-sum EFGs with perfect recall, through the self-play framework. Leveraging predictive online mirror descent, we introduce Predictive Treeplex Blackwell (PTB), and show a convergence rate to Nash equilibrium in self-play. We then show how to stabilize PTB with a stepsize, resulting in an algorithm with a state-of-the-art convergence rate. We provide an extensive set of experiments to compare our framework with several algorithmic benchmarks, including CFR and its predictive variant, and we highlight interesting connections between practical performance and the stepsize-dependence or stepsize-invariance properties of classical algorithms.
Paper Structure (32 sections, 16 theorems, 60 equations, 12 figures, 1 table, 8 algorithms)

This paper contains 32 sections, 16 theorems, 60 equations, 12 figures, 1 table, 8 algorithms.

Key Result

Proposition 2.1

Let $\bm{x}_{1},...,\bm{x}_{T} \in \mathcal{X}$ and $\bm{y}_{1},...,\bm{y}_{T} \in \mathcal{Y}$ be computed in the self-play framework. Let $\left(\bar{\bm{x}}_{T},\bar{\bm{y}}_{T}\right) = \frac{1}{T} \sum_{t=1}^{T}\left(\bm{x}_{t},\bm{y}_{t}\right)$. Then, for ${\sf Reg}^{T}_{1}$ and ${\sf Reg}^{T

Figures (12)

  • Figure 1: Dynamics of RM+ in $\mathbb{R}_{+}^{2}$. We write $\bm{g}_{t}=\bm{g}(\bm{x}_{t},\bm{\ell}_{t})$.
  • Figure 2: Convergence to Nash equilibrium as a function of number of iterations for PTB+ with quadratic averaging, CFR+ with linear averaging, PCFR+ with quadratic averaging, and SC-POMD with quadratic averaging. Every algorithm is using alternation.
  • Figure 3: Convergence to Nash equilibrium for the last iterates of PTB+, CFR+, PCFR+, and SC-POMD. Every algorithm is using alternation.
  • Figure 4: Convergence to Nash equilibrium as a function of number of iterations for TB+ with quadratic averaging, PTB+ with quadratic averaging and last iterate, and Smooth PTB+ with quadratic averaging and last iterate. Every algorithm is using alternation.
  • Figure 5: Convergence to Nash equilibrium as a function of number of iterations using uniform, linear, and quadratic averaging, as well as the last iterate, with and without alternation for TB+.
  • ...and 7 more figures

Theorems & Definitions (27)

  • Proposition 2.1: freund1999adaptive
  • Proposition 3.1
  • proof
  • Remark 3.2: Comparison with Lagrangian Hedging
  • Proposition 4.1
  • Proposition 4.2
  • Corollary 4.3
  • Proposition 4.4
  • Proposition 4.5
  • Proposition 4.6
  • ...and 17 more