Table of Contents
Fetching ...

From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications

Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

TL;DR

This work introduces a black-box A2L reduction that converts average-iterate convergence into last-iterate convergence for uncoupled dynamics in games with linear utilities, including bimatrix and polymatrix structures. By applying A2L to Optimistic Multiplicative Weights Update, it achieves a gradient-feedback last-iterate rate of $O\left(\frac{\log d}{T}\right)$ and a bandit-feedback rate of $\widetilde{O}\left(d^{1/5} T^{-1/5}\right)$, substantially improving dimension dependence over prior results. The reduction preserves uncoupledness and extends to nonlinear utilities under certain structures (e.g., PRD in Fisher markets), offering a simple pathway to anytime last-iterate guarantees with strong dynamic-regret implications. These results yield state-of-the-art last-iterate performance for multi-player zero-sum polymatrix games and broaden the practical applicability of last-iterate analysis in decentralized learning contexts.

Abstract

The convergence of online learning algorithms in games under self-play is a fundamental question in game theory and machine learning. Among various notions of convergence, last-iterate convergence is particularly desirable, as it reflects the actual decisions made by the learners and captures the day-to-day behavior of the learning dynamics. While many algorithms are known to converge in the average-iterate, achieving last-iterate convergence typically requires considerably more effort in both the design and the analysis of the algorithm. Somewhat surprisingly, we show in this paper that for a large family of games, there exists a simple black-box reduction that transforms the average iterates of an uncoupled learning dynamics into the last iterates of a new uncoupled learning dynamics, thus also providing a reduction from last-iterate convergence to average-iterate convergence. Our reduction applies to games where each player's utility is linear in both their own strategy and the joint strategy of all opponents. This family includes two-player bimatrix games and generalizations such as multi-player polymatrix games. By applying our reduction to the Optimistic Multiplicative Weights Update algorithm, we obtain new state-of-the-art last-iterate convergence rates for uncoupled learning dynamics in multi-player zero-sum polymatrix games: (1) an $O(\frac{\log d}{T})$ last-iterate convergence rate under gradient feedback, representing an exponential improvement in the dependence on the dimension $d$ (i.e., the maximum number of actions available to either player); and (2) an $\widetilde{O}(d^{\frac{1}{5}} T^{-\frac{1}{5}})$ last-iterate convergence rate under bandit feedback, improving upon the previous best rates of $\widetilde{O}(\sqrt{d} T^{-\frac{1}{8}})$ and $\widetilde{O}(\sqrt{d} T^{-\frac{1}{6}})$.

From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications

TL;DR

This work introduces a black-box A2L reduction that converts average-iterate convergence into last-iterate convergence for uncoupled dynamics in games with linear utilities, including bimatrix and polymatrix structures. By applying A2L to Optimistic Multiplicative Weights Update, it achieves a gradient-feedback last-iterate rate of and a bandit-feedback rate of , substantially improving dimension dependence over prior results. The reduction preserves uncoupledness and extends to nonlinear utilities under certain structures (e.g., PRD in Fisher markets), offering a simple pathway to anytime last-iterate guarantees with strong dynamic-regret implications. These results yield state-of-the-art last-iterate performance for multi-player zero-sum polymatrix games and broaden the practical applicability of last-iterate analysis in decentralized learning contexts.

Abstract

The convergence of online learning algorithms in games under self-play is a fundamental question in game theory and machine learning. Among various notions of convergence, last-iterate convergence is particularly desirable, as it reflects the actual decisions made by the learners and captures the day-to-day behavior of the learning dynamics. While many algorithms are known to converge in the average-iterate, achieving last-iterate convergence typically requires considerably more effort in both the design and the analysis of the algorithm. Somewhat surprisingly, we show in this paper that for a large family of games, there exists a simple black-box reduction that transforms the average iterates of an uncoupled learning dynamics into the last iterates of a new uncoupled learning dynamics, thus also providing a reduction from last-iterate convergence to average-iterate convergence. Our reduction applies to games where each player's utility is linear in both their own strategy and the joint strategy of all opponents. This family includes two-player bimatrix games and generalizations such as multi-player polymatrix games. By applying our reduction to the Optimistic Multiplicative Weights Update algorithm, we obtain new state-of-the-art last-iterate convergence rates for uncoupled learning dynamics in multi-player zero-sum polymatrix games: (1) an last-iterate convergence rate under gradient feedback, representing an exponential improvement in the dependence on the dimension (i.e., the maximum number of actions available to either player); and (2) an last-iterate convergence rate under bandit feedback, improving upon the previous best rates of and .

Paper Structure

This paper contains 23 sections, 12 theorems, 50 equations, 2 algorithms.

Key Result

Lemma 1

Let $\{x^t\}$ be the iterates of an online learning dynamics in a zero-sum polymatrix game. Define $\overline{x}^T = \frac{1}{T}\sum_{t=1}^T x^t$ to be the average iterate for all $T \ge 1$. Then the total gap of the average iterate $\overline{x}^T$ for any $T \ge 1$ is

Theorems & Definitions (23)

  • Example 1: Two-Player Bimatrix Games
  • Example 2: Multi-Player Polymatrix Games
  • Lemma 1: Average-Iterate Convergence by Bounding Regret
  • Theorem 1: Proposition 7 of syrgkanis2015fast
  • Lemma 2: Adapted from Theorem 4 of syrgkanis2015fast
  • Theorem 2
  • proof : Proof of \ref{['thm:reduction']}
  • Theorem 3
  • Corollary 1
  • Theorem 4
  • ...and 13 more