Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

Yang Cai; Gabriele Farina; Julien Grand-Clément; Christian Kroer; Chung-Wei Lee; Haipeng Luo; Weiqiang Zheng

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

TL;DR

This work addresses the elusive question of last-iterate convergence for Regret Matching$^+$-based dynamics in two-player zero-sum games. It introduces ExRM$^+$ and SPRM$^+$, variants grounded in extragradient and optimistic-style updates, and proves they exhibit asymptotic last-iterate convergence in both duality gap and iterates, with a $O(1/\sqrt{t})$ best-iterate rate and, when combined with restarting, linear last-iterate convergence. A Minty-condition-based analysis reveals a geometric structure of limit points, enabling convergence proofs despite non-monotone regret operators. Numerical experiments on matrix games, Kuhn poker, and Goofspiel corroborate the theory, showing substantial improvements over RM$^+$-type methods and demonstrating the practical value of restart schemes. The results offer a fresh variational-inequality perspective on last-iterate convergence for non-monotone operators and pave the way for robust RM$^+$-based solvers in large-scale extensive-form games.

Abstract

We study last-iterate convergence properties of algorithms for solving two-player zero-sum games based on Regret Matching$^+$ (RM$^+$). Despite their widespread use for solving real games, virtually nothing is known about their last-iterate convergence. A major obstacle to analyzing RM-type dynamics is that their regret operators lack Lipschitzness and (pseudo)monotonicity. We start by showing numerically that several variants used in practice, such as RM$^+$, predictive RM$^+$ and alternating RM$^+$, all lack last-iterate convergence guarantees even on a simple $3\times 3$ matrix game. We then prove that recent variants of these algorithms based on a smoothing technique, extragradient RM$^{+}$ and smooth Predictive RM$^+$, enjoy asymptotic last-iterate convergence (without a rate), $1/\sqrt{t}$ best-iterate convergence, and when combined with restarting, linear-rate last-iterate convergence. Our analysis builds on a new characterization of the geometric structure of the limit points of our algorithms, marking a significant departure from most of the literature on last-iterate convergence. We believe that our analysis may be of independent interest and offers a fresh perspective for studying last-iterate convergence in algorithms based on non-monotone operators.

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

TL;DR

This work addresses the elusive question of last-iterate convergence for Regret Matching

-based dynamics in two-player zero-sum games. It introduces ExRM

and SPRM

, variants grounded in extragradient and optimistic-style updates, and proves they exhibit asymptotic last-iterate convergence in both duality gap and iterates, with a

best-iterate rate and, when combined with restarting, linear last-iterate convergence. A Minty-condition-based analysis reveals a geometric structure of limit points, enabling convergence proofs despite non-monotone regret operators. Numerical experiments on matrix games, Kuhn poker, and Goofspiel corroborate the theory, showing substantial improvements over RM

-type methods and demonstrating the practical value of restart schemes. The results offer a fresh variational-inequality perspective on last-iterate convergence for non-monotone operators and pave the way for robust RM

-based solvers in large-scale extensive-form games.

Abstract

We study last-iterate convergence properties of algorithms for solving two-player zero-sum games based on Regret Matching

(RM

). Despite their widespread use for solving real games, virtually nothing is known about their last-iterate convergence. A major obstacle to analyzing RM-type dynamics is that their regret operators lack Lipschitzness and (pseudo)monotonicity. We start by showing numerically that several variants used in practice, such as RM

, predictive RM

and alternating RM

, all lack last-iterate convergence guarantees even on a simple

matrix game. We then prove that recent variants of these algorithms based on a smoothing technique, extragradient RM

and smooth Predictive RM

, enjoy asymptotic last-iterate convergence (without a rate),

best-iterate convergence, and when combined with restarting, linear-rate last-iterate convergence. Our analysis builds on a new characterization of the geometric structure of the limit points of our algorithms, marking a significant departure from most of the literature on last-iterate convergence. We believe that our analysis may be of independent interest and offers a fresh perspective for studying last-iterate convergence in algorithms based on non-monotone operators.

Paper Structure (41 sections, 28 theorems, 72 equations, 14 figures, 10 algorithms)

This paper contains 41 sections, 28 theorems, 72 equations, 14 figures, 10 algorithms.

Introduction
Preliminaries on Regret Matching$^+$
Notation.
Regret Matching$^+$ (RM$^+$) and its variants.
Extragradient RM$^+$ and Smooth Predictive RM$^+$
Non-convergence of RM$^+$, alternating RM$^+$, and PRM$^+$
Convergence Properties of ExRM$^+$
Convergence of the Iterates
Convergence in the Duality Gap Does Not Rule Out Cycling.
Geometric Structure of Limit Points
Best-Iterate Convergence Rate of ExRM$^+$
Linear Last-Iterate Convergence for ExRM$^+$ with Restarts
Last-Iterate Convergence of SPRM$^+$
Numerical Experiments
Conclusions
...and 26 more sections

Key Result

Theorem 1

If a matrix game has a strict Nash equilibrium$(x^\star, y^\star)$, RM$^+$ (RM+) converges in last-iterate, that is, $\lim_{t\rightarrow \infty}\{(x^t, y^t)\} = (x^\star, y^\star)$.

Figures (14)

Figure 1: Duality gap of the current iterates generated by RM$^+$, PRM$^+$, and their alternating variants on the zero-sum game with payoff matrix $A = [[3,0,-3],[0,3,-4],[0,0,1]]$.
Figure 2: Pictorial illustration of \ref{['lemma:Structure of Limit Points']}.
Figure 3: Empirical performances of several algorithms on the $3 \times 3$ matrix game (left plot), Kuhn poker and Goofspiel (center plots), and random instances (right plot).
Figure 4: Last $2000$ iterates of Regret Matching$^+$ after $10^5$ iterations for solving the matrix game from Figure \ref{['fig:bad-matrix-instance']}.
Figure 5: Last $2000$ iterates of Alternating Regret Matching$^+$ after $10^5$ iterations for solving the matrix game from Figure \ref{['fig:bad-matrix-instance']}.
...and 9 more figures

Theorems & Definitions (53)

Theorem 1: Convergence of RM$^+$ to Strict NE
Lemma 1
Lemma 2: Adapted from Lemma 12.1.10 in facchinei2003finite
Lemma 3
Proposition 1
Lemma 4: Structure of Limit Points
proof
Lemma 5: Unique limit point
proof
Theorem 2: Last-Iterate Convergence of ExRM$^+$
...and 43 more

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

TL;DR

Abstract

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (53)