Table of Contents
Fetching ...

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

TL;DR

The work shows that forgetfulness is necessary for fast last-iterate convergence in learning dynamics for two-player zero-sum games. By constructing a simple parametric 2×2 game $A_oldsymbol{}$, it proves that OFTRL variants and related optimistic FTRL methods can maintain a constant duality gap for time on the order of $t\,\ge\,c_1/(\eta L \boldsymbol{})$, regardless of problem size, contradicting any universal last-iterate rate. This slow behavior contrasts with OGDA, which achieves $O(1/\sqrt{T})$ last-iterate convergence, and extends the negative result to higher dimensions via a reduction using duplication and scaling. The findings imply that achieving fast last-iterate convergence in general games requires forgetting mechanisms, guiding future algorithm design and analysis toward forgetful online learning frameworks.

Abstract

Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games. However, in terms of last-iterate convergence in two-player zero-sum games, an increasingly popular topic in this area, OGDA guarantees that the duality gap shrinks at a rate of $O(1/\sqrt{T})$, while the best existing last-iterate convergence for OMWU depends on some game-dependent constant that could be arbitrarily large. This begs the question: is this potentially slow last-iterate convergence an inherent disadvantage of OMWU, or is the current analysis too loose? Somewhat surprisingly, we show that the former is true. More generally, we prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue: for any arbitrarily small $δ>0$, there exists a $2\times 2$ matrix game such that the algorithm admits a constant duality gap even after $1/δ$ rounds. This class of algorithms includes OMWU and other standard optimistic follow-the-regularized-leader algorithms.

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

TL;DR

The work shows that forgetfulness is necessary for fast last-iterate convergence in learning dynamics for two-player zero-sum games. By constructing a simple parametric 2×2 game , it proves that OFTRL variants and related optimistic FTRL methods can maintain a constant duality gap for time on the order of , regardless of problem size, contradicting any universal last-iterate rate. This slow behavior contrasts with OGDA, which achieves last-iterate convergence, and extends the negative result to higher dimensions via a reduction using duplication and scaling. The findings imply that achieving fast last-iterate convergence in general games requires forgetting mechanisms, guiding future algorithm design and analysis toward forgetful online learning frameworks.

Abstract

Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and convergence to coarse correlated equilibria even in general-sum games. However, in terms of last-iterate convergence in two-player zero-sum games, an increasingly popular topic in this area, OGDA guarantees that the duality gap shrinks at a rate of , while the best existing last-iterate convergence for OMWU depends on some game-dependent constant that could be arbitrarily large. This begs the question: is this potentially slow last-iterate convergence an inherent disadvantage of OMWU, or is the current analysis too loose? Somewhat surprisingly, we show that the former is true. More generally, we prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue: for any arbitrarily small , there exists a matrix game such that the algorithm admits a constant duality gap even after rounds. This class of algorithms includes OMWU and other standard optimistic follow-the-regularized-leader algorithms.
Paper Structure (35 sections, 15 theorems, 47 equations, 4 figures)

This paper contains 35 sections, 15 theorems, 47 equations, 4 figures.

Key Result

Theorem 1

For OMWU with constant step size, there is no function $f$ such that the corresponding learning dynamics $\{(x^t,y^t)\}_{t\geq 1}$ in two-player zero-sum games $[0,1]^{d_1 \times d_2}$ has a last-iterate convergence rate of $f(d_1,d_2, T)$.Under the same condition, OGDA has a last-iterate convergenc

Figures (4)

  • Figure 1: Comparison of the dynamics produced by three variants of OFTRL with different regularizers (negative entropy, logarithmic regularizer, and squared Euclidean norm) and OGDA in the same game $A_\delta$ defined in \ref{['eq:A delta']} for $\delta := 10^{-2}$. The bottom row shows the duality gap achieved by the last iterates. The OFTRL variants exhibit poor performance due to their lack of forgetfulness, while OGDA converges quickly to the Nash equilibrium. Since the regularizers in the first two plots are Legendre, the dynamics are equivalent to the ones produced by optimistic OMD with the respective Bregman divergences. In the plot for OMWU we observe that $x^t[1]$ can get extremely close to the boundary (e.g., in the range $1 - e^{-50} < x^t[1] < 1$). To correctly simulate the dynamics, we used 1000 digits of precision. The red star, blue dot, and green square illustrate the key times $T_1$, $T_2$, $T_3$ defined in our analysis in Section \ref{['sec:analysis']}.
  • Figure 2: Performance of OMWU on the game $A_\delta$ defined in \ref{['eq:A delta']} for three choices of $\delta$. In all plots, the learning rate was set to $\eta = 0.1$. As predicted by our analysis, the length of the "flat region" between iteration $T_1$ (red star) and $T_2$ (blue dot) scales inversely proportionally with $\delta$.
  • Figure 3: Pictorial depiction of the three stages incurred by the OFTRL dynamics in the game $A_\delta$ defined in \ref{['eq:A delta']}. The point $z^*$ denotes the unique Nash equilibrium. The times $T_1$ and $T_2$ are shown for concrete instantiations of OFTRL in \ref{['fig:intro plots']} by a red star and a blue dot, respectively. The times $T_s$ and $T_h$ are defined in the proof of \ref{['theorem: main']} in \ref{['app:proof main']}.
  • Figure 4: Comparison of the dynamics produced by three variants of OFTRL with different regularizers (negative entropy, logarithmic regularizer, and squared Euclidean norm) and OGDA in the same game $A_\delta$ defined in \ref{['eq:A delta']} for $\delta := 10^{-2}$ and adaptive step size with $\epsilon = 0.1$. The bottom row shows the duality gap achieved by the iterates.

Theorems & Definitions (30)

  • Theorem : Informal
  • Lemma 1: Monotonicity of $F_{\eta, R}$
  • Proposition 1
  • Proposition 2
  • Lemma 2
  • Theorem 1
  • Theorem 2
  • Lemma 3
  • proof
  • Theorem 3
  • ...and 20 more