On Separation Between Best-Iterate, Random-Iterate, and Last-Iterate Convergence of Learning in Games
Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng
TL;DR
The paper investigates non-ergodic convergence notions for learning in two-player zero-sum games, focusing on last-, random-, and best-iterate convergence measured by the duality gap. It proves a separation: OMWU exhibits a polynomial uniform best-iterate rate $O(T^{-rac{1}{6}})$ in $2\times2$ games with a fully mixed Nash equilibrium, while it has no polynomial uniform random-iterate rate and lacks uniform last-iterate convergence. The authors introduce a novel two-phase analysis connecting dynamic regret and interval regret to establish the best-iterate bound, with a global phase controlling random-iterate behavior and an initial phase achieving fast convergence toward $O(\delta)$ duality gap. These results refine the understanding of how different convergence notions relate and demonstrate that uniform best-iterate guarantees can hold without corresponding random-iterate guarantees, offering new techniques for analyzing best-iterate performance. The findings have implications for designing and analyzing learning dynamics in games, particularly in understanding when subsequential best performance can be achieved independently of average or random performance.
Abstract
Non-ergodic convergence of learning dynamics in games is widely studied recently because of its importance in both theory and practice. Recent work (Cai et al., 2024) showed that a broad class of learning dynamics, including Optimistic Multiplicative Weights Update (OMWU), can exhibit arbitrarily slow last-iterate convergence even in simple $2 \times 2$ matrix games, despite many of these dynamics being known to converge asymptotically in the last iterate. It remains unclear, however, whether these algorithms achieve fast non-ergodic convergence under weaker criteria, such as best-iterate convergence. We show that for $2\times 2$ matrix games, OMWU achieves an $O(T^{-1/6})$ best-iterate convergence rate, in stark contrast to its slow last-iterate convergence in the same class of games. Furthermore, we establish a lower bound showing that OMWU does not achieve any polynomial random-iterate convergence rate, measured by the expected duality gaps across all iterates. This result challenges the conventional wisdom that random-iterate convergence is essentially equivalent to best-iterate convergence, with the former often used as a proxy for establishing the latter. Our analysis uncovers a new connection to dynamic regret and presents a novel two-phase approach to best-iterate convergence, which could be of independent interest.
