Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms
Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng
TL;DR
The work shows that forgetfulness is necessary for fast last-iterate convergence in learning dynamics for two-player zero-sum games. By constructing a simple parametric 2×2 game $A_oldsymbol{}$, it proves that OFTRL variants and related optimistic FTRL methods can maintain a constant duality gap for time on the order of $t\,\ge\,c_1/(\eta L \boldsymbol{})$, regardless of problem size, contradicting any universal last-iterate rate. This slow behavior contrasts with OGDA, which achieves $O(1/\sqrt{T})$ last-iterate convergence, and extends the negative result to higher dimensions via a reduction using duplication and scaling. The findings imply that achieving fast last-iterate convergence in general games requires forgetting mechanisms, guiding future algorithm design and analysis toward forgetful online learning frameworks.
Abstract
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games. However, in terms of last-iterate convergence in two-player zero-sum games, an increasingly popular topic in this area, OGDA guarantees that the duality gap shrinks at a rate of $O(1/\sqrt{T})$, while the best existing last-iterate convergence for OMWU depends on some game-dependent constant that could be arbitrarily large. This begs the question: is this potentially slow last-iterate convergence an inherent disadvantage of OMWU, or is the current analysis too loose? Somewhat surprisingly, we show that the former is true. More generally, we prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue: for any arbitrarily small $δ>0$, there exists a $2\times 2$ matrix game such that the algorithm admits a constant duality gap even after $1/δ$ rounds. This class of algorithms includes OMWU and other standard optimistic follow-the-regularized-leader algorithms.
