Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

Brian Hu Zhang; Ioannis Anagnostides; Tuomas Sandholm

Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

Brian Hu Zhang, Ioannis Anagnostides, Tuomas Sandholm

TL;DR

Addressing the long-standing gap between theory and practice in zero-sum online learning, the paper introduces a scale-invariant, parameter-free variant of regret matching (IREG-PRM$^+$) and an adaptive optimistic gradient descent (AdOGD) with RVU-type guarantees, achieving $O_T(1/T)$ average-iterate and $O_T(1/ olinebreak[4] olinebreak[4sqrt]{T})$ best-iterate convergence. It further develops IR-PRM and IR-PRM$^+$ with predictions and a nondecreasing regret norm, plus an extragradient variant (IREG-PRM$^+$ EG) that yields $O_T(1/T)$ equilibrium guarantees in zero-sum games, all while maintaining competitive performance in benchmarks. The work unifies regret-matching with gradient-based optimization, clarifying why RM-based methods perform well in practice and delivering parameter-free, scale-invariant algorithms with strong theoretical and empirical convergence guarantees. Overall, it closes the theory-practice gap in zero-sum game solving and provides practical, scalable tools for self-play and adversarial learning.

Abstract

A considerable chasm has been looming for decades between theory and practice in zero-sum game solving through first-order methods. Although a convergence rate of $T^{-1}$ has long been established since Nemirovski's mirror-prox algorithm and Nesterov's excessive gap technique in the early 2000s, the most effective paradigm in practice is *counterfactual regret minimization*, which is based on *regret matching* and its modern variants. In particular, the state of the art across most benchmarks is *predictive* regret matching$^+$ (PRM$^+$), in conjunction with non-uniform averaging. Yet, such algorithms can exhibit slower $Ω(T^{-1/2})$ convergence even in self-play. In this paper, we close the gap between theory and practice. We propose a new scale-invariant and parameter-free variant of PRM$^+$, which we call IREG-PRM$^+$. We show that it achieves $T^{-1/2}$ best-iterate and $T^{-1}$ (i.e., optimal) average-iterate convergence guarantees, while also being on par with PRM$^+$ on benchmark games. From a technical standpoint, we draw an analogy between IREG-PRM$^+$ and optimistic gradient descent with *adaptive* learning rate. The basic flaw of PRM$^+$ is that the ($\ell_2$-)norm of the regret vector -- which can be thought of as the inverse of the learning rate -- can decrease. By contrast, we design IREG-PRM$^+$ so as to maintain the invariance that the norm of the regret vector is nondecreasing. This enables us to derive an RVU-type bound for IREG-PRM$^+$, the first such property that does not rely on introducing additional hyperparameters to enforce smoothness. Furthermore, we find that IREG-PRM$^+$ performs on par with an adaptive version of optimistic gradient descent that we introduce whose learning rate depends on the misprediction error, demystifying the effectiveness of the regret matching family *vis-a-vis* more standard optimization techniques.

Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

TL;DR

Addressing the long-standing gap between theory and practice in zero-sum online learning, the paper introduces a scale-invariant, parameter-free variant of regret matching (IREG-PRM

) and an adaptive optimistic gradient descent (AdOGD) with RVU-type guarantees, achieving

average-iterate and

best-iterate convergence. It further develops IR-PRM and IR-PRM

with predictions and a nondecreasing regret norm, plus an extragradient variant (IREG-PRM

EG) that yields

equilibrium guarantees in zero-sum games, all while maintaining competitive performance in benchmarks. The work unifies regret-matching with gradient-based optimization, clarifying why RM-based methods perform well in practice and delivering parameter-free, scale-invariant algorithms with strong theoretical and empirical convergence guarantees. Overall, it closes the theory-practice gap in zero-sum game solving and provides practical, scalable tools for self-play and adversarial learning.

Abstract

A considerable chasm has been looming for decades between theory and practice in zero-sum game solving through first-order methods. Although a convergence rate of

has long been established since Nemirovski's mirror-prox algorithm and Nesterov's excessive gap technique in the early 2000s, the most effective paradigm in practice is *counterfactual regret minimization*, which is based on *regret matching* and its modern variants. In particular, the state of the art across most benchmarks is *predictive* regret matching

(PRM

), in conjunction with non-uniform averaging. Yet, such algorithms can exhibit slower

convergence even in self-play. In this paper, we close the gap between theory and practice. We propose a new scale-invariant and parameter-free variant of PRM

, which we call IREG-PRM

. We show that it achieves

best-iterate and

(i.e., optimal) average-iterate convergence guarantees, while also being on par with PRM

on benchmark games. From a technical standpoint, we draw an analogy between IREG-PRM

and optimistic gradient descent with *adaptive* learning rate. The basic flaw of PRM

is that the (

-)norm of the regret vector -- which can be thought of as the inverse of the learning rate -- can decrease. By contrast, we design IREG-PRM

so as to maintain the invariance that the norm of the regret vector is nondecreasing. This enables us to derive an RVU-type bound for IREG-PRM

, the first such property that does not rely on introducing additional hyperparameters to enforce smoothness. Furthermore, we find that IREG-PRM

performs on par with an adaptive version of optimistic gradient descent that we introduce whose learning rate depends on the misprediction error, demystifying the effectiveness of the regret matching family *vis-a-vis* more standard optimization techniques.

Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

TL;DR

Abstract

Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (19)