Extensive-Form Game Solving via Blackwell Approachability on Treeplexes
Darshan Chakrabarti, Julien Grand-Clément, Christian Kroer
TL;DR
This work introduces the first framework for Blackwell approachability on the sequence-form polytope (treeplex) to develop regret-minimization methods for solving two-player zero-sum extensive-form games via self-play. By leveraging the conic hull $\mathsf{cone}(\mathcal{T})$ and hyperplane forcing, it shows how to obtain a regret minimizer over the treeplex with desirable properties, including stepsize-invariance. It then instantiates this framework with PTB+, Smooth PTB+, and AdaGradTB+, achieving $O(1/\sqrt{T})$ and $O(1/T)$ convergence rates under different stability and adaptivity regimes. Empirical results demonstrate that treeplex stepsize invariance alone does not explain CFR+'s practical success; rather, infoset-level invariance from CFR+ plays a crucial role, with PCFR+ delivering strong average-iterate performance across benchmarks. Overall, the paper provides a modular, scalable toolkit for Blackwell-based regret minimization in EFGs and clarifies the trade-offs between invariance properties and convergence in practice.
Abstract
In this paper, we introduce the first algorithmic framework for Blackwell approachability on the sequence-form polytope, the class of convex polytopes capturing the strategies of players in extensive-form games (EFGs). This leads to a new class of regret-minimization algorithms that are stepsize-invariant, in the same sense as the Regret Matching and Regret Matching$^+$ algorithms for the simplex. Our modular framework can be combined with any existing regret minimizer over cones to compute a Nash equilibrium in two-player zero-sum EFGs with perfect recall, through the self-play framework. Leveraging predictive online mirror descent, we introduce Predictive Treeplex Blackwell$^+$ (PTB$^+$), and show a $O(1/\sqrt{T})$ convergence rate to Nash equilibrium in self-play. We then show how to stabilize PTB$^+$ with a stepsize, resulting in an algorithm with a state-of-the-art $O(1/T)$ convergence rate. We provide an extensive set of experiments to compare our framework with several algorithmic benchmarks, including CFR$^+$ and its predictive variant, and we highlight interesting connections between practical performance and the stepsize-dependence or stepsize-invariance properties of classical algorithms.
