Extensive-Form Game Solving via Blackwell Approachability on Treeplexes

Darshan Chakrabarti; Julien Grand-Clément; Christian Kroer

Extensive-Form Game Solving via Blackwell Approachability on Treeplexes

Darshan Chakrabarti, Julien Grand-Clément, Christian Kroer

TL;DR

This work introduces the first framework for Blackwell approachability on the sequence-form polytope (treeplex) to develop regret-minimization methods for solving two-player zero-sum extensive-form games via self-play. By leveraging the conic hull $\mathsf{cone}(\mathcal{T})$ and hyperplane forcing, it shows how to obtain a regret minimizer over the treeplex with desirable properties, including stepsize-invariance. It then instantiates this framework with PTB+, Smooth PTB+, and AdaGradTB+, achieving $O(1/\sqrt{T})$ and $O(1/T)$ convergence rates under different stability and adaptivity regimes. Empirical results demonstrate that treeplex stepsize invariance alone does not explain CFR+'s practical success; rather, infoset-level invariance from CFR+ plays a crucial role, with PCFR+ delivering strong average-iterate performance across benchmarks. Overall, the paper provides a modular, scalable toolkit for Blackwell-based regret minimization in EFGs and clarifies the trade-offs between invariance properties and convergence in practice.

Abstract

In this paper, we introduce the first algorithmic framework for Blackwell approachability on the sequence-form polytope, the class of convex polytopes capturing the strategies of players in extensive-form games (EFGs). This leads to a new class of regret-minimization algorithms that are stepsize-invariant, in the same sense as the Regret Matching and Regret Matching$^+$ algorithms for the simplex. Our modular framework can be combined with any existing regret minimizer over cones to compute a Nash equilibrium in two-player zero-sum EFGs with perfect recall, through the self-play framework. Leveraging predictive online mirror descent, we introduce Predictive Treeplex Blackwell$^+$ (PTB$^+$), and show a $O(1/\sqrt{T})$ convergence rate to Nash equilibrium in self-play. We then show how to stabilize PTB$^+$ with a stepsize, resulting in an algorithm with a state-of-the-art $O(1/T)$ convergence rate. We provide an extensive set of experiments to compare our framework with several algorithmic benchmarks, including CFR$^+$ and its predictive variant, and we highlight interesting connections between practical performance and the stepsize-dependence or stepsize-invariance properties of classical algorithms.

Extensive-Form Game Solving via Blackwell Approachability on Treeplexes

TL;DR

and hyperplane forcing, it shows how to obtain a regret minimizer over the treeplex with desirable properties, including stepsize-invariance. It then instantiates this framework with PTB+, Smooth PTB+, and AdaGradTB+, achieving

and

convergence rates under different stability and adaptivity regimes. Empirical results demonstrate that treeplex stepsize invariance alone does not explain CFR+'s practical success; rather, infoset-level invariance from CFR+ plays a crucial role, with PCFR+ delivering strong average-iterate performance across benchmarks. Overall, the paper provides a modular, scalable toolkit for Blackwell-based regret minimization in EFGs and clarifies the trade-offs between invariance properties and convergence in practice.

Abstract

algorithms for the simplex. Our modular framework can be combined with any existing regret minimizer over cones to compute a Nash equilibrium in two-player zero-sum EFGs with perfect recall, through the self-play framework. Leveraging predictive online mirror descent, we introduce Predictive Treeplex Blackwell

(PTB

), and show a

convergence rate to Nash equilibrium in self-play. We then show how to stabilize PTB

with a stepsize, resulting in an algorithm with a state-of-the-art

convergence rate. We provide an extensive set of experiments to compare our framework with several algorithmic benchmarks, including CFR

and its predictive variant, and we highlight interesting connections between practical performance and the stepsize-dependence or stepsize-invariance properties of classical algorithms.

Paper Structure (32 sections, 16 theorems, 60 equations, 12 figures, 1 table, 8 algorithms)

This paper contains 32 sections, 16 theorems, 60 equations, 12 figures, 1 table, 8 algorithms.

Introduction
Preliminaries on EFGs
Extensive-form games.
Treeplexes.
Regret minimization and self-play framework.
CFR and Regret Matching$^+$.
Blackwell Approachability on Treeplexes
Instantiations of Algorithm \ref{['alg:blackwell-approachability based regmin']}
Predictive Treeplex Blackwell$^+$ ( PTB+).
A stable algorithm: Smooth PTB+.
An adaptive algorithm: AdaGradTB+.
Numerical Experiments
Conclusion
Acknowledgments
Self-Play Framework
...and 17 more sections

Key Result

Proposition 2.1

Let $\bm{x}_{1},...,\bm{x}_{T} \in \mathcal{X}$ and $\bm{y}_{1},...,\bm{y}_{T} \in \mathcal{Y}$ be computed in the self-play framework. Let $\left(\bar{\bm{x}}_{T},\bar{\bm{y}}_{T}\right) = \frac{1}{T} \sum_{t=1}^{T}\left(\bm{x}_{t},\bm{y}_{t}\right)$. Then, for ${\sf Reg}^{T}_{1}$ and ${\sf Reg}^{T

Figures (12)

Figure 1: Dynamics of RM+ in $\mathbb{R}_{+}^{2}$. We write $\bm{g}_{t}=\bm{g}(\bm{x}_{t},\bm{\ell}_{t})$.
Figure 2: Convergence to Nash equilibrium as a function of number of iterations for PTB+ with quadratic averaging, CFR+ with linear averaging, PCFR+ with quadratic averaging, and SC-POMD with quadratic averaging. Every algorithm is using alternation.
Figure 3: Convergence to Nash equilibrium for the last iterates of PTB+, CFR+, PCFR+, and SC-POMD. Every algorithm is using alternation.
Figure 4: Convergence to Nash equilibrium as a function of number of iterations for TB+ with quadratic averaging, PTB+ with quadratic averaging and last iterate, and Smooth PTB+ with quadratic averaging and last iterate. Every algorithm is using alternation.
Figure 5: Convergence to Nash equilibrium as a function of number of iterations using uniform, linear, and quadratic averaging, as well as the last iterate, with and without alternation for TB+.
...and 7 more figures

Theorems & Definitions (27)

Proposition 2.1: freund1999adaptive
Proposition 3.1
proof
Remark 3.2: Comparison with Lagrangian Hedging
Proposition 4.1
Proposition 4.2
Corollary 4.3
Proposition 4.4
Proposition 4.5
Proposition 4.6
...and 17 more

Extensive-Form Game Solving via Blackwell Approachability on Treeplexes

TL;DR

Abstract

Extensive-Form Game Solving via Blackwell Approachability on Treeplexes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (27)