Table of Contents
Fetching ...

On Separation Between Best-Iterate, Random-Iterate, and Last-Iterate Convergence of Learning in Games

Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

TL;DR

The paper investigates non-ergodic convergence notions for learning in two-player zero-sum games, focusing on last-, random-, and best-iterate convergence measured by the duality gap. It proves a separation: OMWU exhibits a polynomial uniform best-iterate rate $O(T^{- rac{1}{6}})$ in $2\times2$ games with a fully mixed Nash equilibrium, while it has no polynomial uniform random-iterate rate and lacks uniform last-iterate convergence. The authors introduce a novel two-phase analysis connecting dynamic regret and interval regret to establish the best-iterate bound, with a global phase controlling random-iterate behavior and an initial phase achieving fast convergence toward $O(\delta)$ duality gap. These results refine the understanding of how different convergence notions relate and demonstrate that uniform best-iterate guarantees can hold without corresponding random-iterate guarantees, offering new techniques for analyzing best-iterate performance. The findings have implications for designing and analyzing learning dynamics in games, particularly in understanding when subsequential best performance can be achieved independently of average or random performance.

Abstract

Non-ergodic convergence of learning dynamics in games is widely studied recently because of its importance in both theory and practice. Recent work (Cai et al., 2024) showed that a broad class of learning dynamics, including Optimistic Multiplicative Weights Update (OMWU), can exhibit arbitrarily slow last-iterate convergence even in simple $2 \times 2$ matrix games, despite many of these dynamics being known to converge asymptotically in the last iterate. It remains unclear, however, whether these algorithms achieve fast non-ergodic convergence under weaker criteria, such as best-iterate convergence. We show that for $2\times 2$ matrix games, OMWU achieves an $O(T^{-1/6})$ best-iterate convergence rate, in stark contrast to its slow last-iterate convergence in the same class of games. Furthermore, we establish a lower bound showing that OMWU does not achieve any polynomial random-iterate convergence rate, measured by the expected duality gaps across all iterates. This result challenges the conventional wisdom that random-iterate convergence is essentially equivalent to best-iterate convergence, with the former often used as a proxy for establishing the latter. Our analysis uncovers a new connection to dynamic regret and presents a novel two-phase approach to best-iterate convergence, which could be of independent interest.

On Separation Between Best-Iterate, Random-Iterate, and Last-Iterate Convergence of Learning in Games

TL;DR

The paper investigates non-ergodic convergence notions for learning in two-player zero-sum games, focusing on last-, random-, and best-iterate convergence measured by the duality gap. It proves a separation: OMWU exhibits a polynomial uniform best-iterate rate in games with a fully mixed Nash equilibrium, while it has no polynomial uniform random-iterate rate and lacks uniform last-iterate convergence. The authors introduce a novel two-phase analysis connecting dynamic regret and interval regret to establish the best-iterate bound, with a global phase controlling random-iterate behavior and an initial phase achieving fast convergence toward duality gap. These results refine the understanding of how different convergence notions relate and demonstrate that uniform best-iterate guarantees can hold without corresponding random-iterate guarantees, offering new techniques for analyzing best-iterate performance. The findings have implications for designing and analyzing learning dynamics in games, particularly in understanding when subsequential best performance can be achieved independently of average or random performance.

Abstract

Non-ergodic convergence of learning dynamics in games is widely studied recently because of its importance in both theory and practice. Recent work (Cai et al., 2024) showed that a broad class of learning dynamics, including Optimistic Multiplicative Weights Update (OMWU), can exhibit arbitrarily slow last-iterate convergence even in simple matrix games, despite many of these dynamics being known to converge asymptotically in the last iterate. It remains unclear, however, whether these algorithms achieve fast non-ergodic convergence under weaker criteria, such as best-iterate convergence. We show that for matrix games, OMWU achieves an best-iterate convergence rate, in stark contrast to its slow last-iterate convergence in the same class of games. Furthermore, we establish a lower bound showing that OMWU does not achieve any polynomial random-iterate convergence rate, measured by the expected duality gaps across all iterates. This result challenges the conventional wisdom that random-iterate convergence is essentially equivalent to best-iterate convergence, with the former often used as a proxy for establishing the latter. Our analysis uncovers a new connection to dynamic regret and presents a novel two-phase approach to best-iterate convergence, which could be of independent interest.

Paper Structure

This paper contains 54 sections, 20 theorems, 76 equations, 1 figure, 1 table.

Key Result

Theorem 1

For two-player zero-sum games with loss matrix $A\in [0,1]^{2 \times 2}$, the uniform random-iterate convergence rate of OMWU with any constant step size $\eta\le \frac{1}{2}$ is $\Omega(\frac{1}{\log T})$. This result continues to hold if we restrict the space of loss matrices to games with fully-m

Figures (1)

  • Figure 1: Random-iterate convergence/average social dynamic regret guarantee of OGDA and OFTRL algorithms with log, entropy, and squared Euclidean norm regularizer. The game is $A_\delta$ defined in \ref{['eq:A delta']} with $\delta = 10^{-2}$. The red region is when the single iterate has a duality gap $\ge 0.1$. We intentionally show different numbers of iterations for different regularizers as illustrated in our lower bounds (\ref{['thm:random-lower-OMWU']} and \ref{['thm:random-lower-all-regularizers']}) and proofs (see \ref{['sec:random proof overview']}).

Theorems & Definitions (32)

  • Theorem 1
  • Remark 1
  • Theorem 2
  • Theorem 3
  • Proposition 1
  • Lemma 1: Adapted from Lemma 1 in wei2021linear
  • Lemma 2: Adapted from Lemma 19 of wei2021linear
  • Lemma 3: Bounded Interval Regret
  • Lemma 4
  • Theorem 4
  • ...and 22 more