Table of Contents
Fetching ...

Faster Rates for No-Regret Learning in General Games via Cautious Optimism

Ashkan Soleymani, Georgios Piliouras, Gabriele Farina

TL;DR

This work tackles the problem of no-regret learning in n-player general-sum games by introducing an uncoupled algorithm that achieves $O\left(n \log^2 d \log T\right)$ per-player regret in self-play. The core method, Dynamic Learning Rate Control Optimistic MWU (DLRC-OMWU), blends Optimistic MWU with a per-step learning-rate optimization that slows learning when a player's maximum regret becomes overly negative, yielding a nonmonotone yet stable pacing mechanism. The authors develop a novel regularizer $\psi$ with strong spectral properties and show multiple equivalent formulations (OFTRL/FTRL) that underpin the regret analysis, including a kernelized extension (KDLRC-OMWU) that handles 0/1-polyhedral games. Across both self-play and adversarial settings, the framework delivers fast convergence to coarse correlated equilibria and provides anytime guarantees without horizon-doubling tricks, marking a substantial improvement over prior polylogarithmic-in-$T$ bounds. The results have broad implications for fast, uncoupled learning in strategic environments and open avenues for applying dynamic pacing to other FTRL/OMWU-based methods.

Abstract

We establish the first uncoupled learning algorithm that attains $O(n \log^2 d \log T)$ per-player regret in multi-player general-sum games, where $n$ is the number of players, $d$ is the number of actions available to each player, and $T$ is the number of repetitions of the game. Our results exponentially improve the dependence on $d$ compared to the $O(n\, d \log T)$ regret attainable by Log-Regularized Lifted Optimistic FTRL [Far+22c], and also reduce the dependence on the number of iterations $T$ from $\log^4 T$ to $\log T$ compared to Optimistic Hedge, the previously well-studied algorithm with $O(n \log d \log^4 T)$ regret [DFG21]. Our algorithm is obtained by combining the classic Optimistic Multiplicative Weights Update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players, making them more cautious when their regret becomes too negative.

Faster Rates for No-Regret Learning in General Games via Cautious Optimism

TL;DR

This work tackles the problem of no-regret learning in n-player general-sum games by introducing an uncoupled algorithm that achieves per-player regret in self-play. The core method, Dynamic Learning Rate Control Optimistic MWU (DLRC-OMWU), blends Optimistic MWU with a per-step learning-rate optimization that slows learning when a player's maximum regret becomes overly negative, yielding a nonmonotone yet stable pacing mechanism. The authors develop a novel regularizer with strong spectral properties and show multiple equivalent formulations (OFTRL/FTRL) that underpin the regret analysis, including a kernelized extension (KDLRC-OMWU) that handles 0/1-polyhedral games. Across both self-play and adversarial settings, the framework delivers fast convergence to coarse correlated equilibria and provides anytime guarantees without horizon-doubling tricks, marking a substantial improvement over prior polylogarithmic-in- bounds. The results have broad implications for fast, uncoupled learning in strategic environments and open avenues for applying dynamic pacing to other FTRL/OMWU-based methods.

Abstract

We establish the first uncoupled learning algorithm that attains per-player regret in multi-player general-sum games, where is the number of players, is the number of actions available to each player, and is the number of repetitions of the game. Our results exponentially improve the dependence on compared to the regret attainable by Log-Regularized Lifted Optimistic FTRL [Far+22c], and also reduce the dependence on the number of iterations from to compared to Optimistic Hedge, the previously well-studied algorithm with regret [DFG21]. Our algorithm is obtained by combining the classic Optimistic Multiplicative Weights Update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players, making them more cautious when their regret becomes too negative.

Paper Structure

This paper contains 31 sections, 43 theorems, 147 equations, 1 figure, 1 table, 2 algorithms.

Key Result

Theorem 3.1

Suppose that $n$ players self-play a general-sum multiplayer game with a finite set of $d$ deterministic strategies per player over $T$ rounds. Further, suppose that each player follows DLRC-OMWU to choose their action based on the history so far. Then, each player incurs $O(n \log^2 d \log T)$ regr

Figures (1)

  • Figure 1: Learning rate landscape: Dependence of $\lambda^{(t)}$, as defined in \ref{['eq:opt_problem_lambda']}, on the optimistic regrets cumulated in $2$-action simplex. For the plot, the values $\eta=1, \alpha=4$ were chosen.

Theorems & Definitions (72)

  • Definition 2.2: RVU property Syrgkanis15:Fast
  • Theorem 3.1: Informal; see \ref{['theorem:regret_bound']} for the detailed version
  • Corollary 3.2
  • Theorem 3.3
  • Remark 3.4
  • Theorem 3.5
  • Corollary 3.6
  • Definition 4.1: 0/1-polyhedral feature map and kernel farina2022kernelized
  • Proposition 4.2: Theorem 4.1 and 4.2 of farina2022kernelized
  • Proposition 4.3
  • ...and 62 more