Faster Rates for No-Regret Learning in General Games via Cautious Optimism

Ashkan Soleymani; Georgios Piliouras; Gabriele Farina

Faster Rates for No-Regret Learning in General Games via Cautious Optimism

Ashkan Soleymani, Georgios Piliouras, Gabriele Farina

TL;DR

This work tackles the problem of no-regret learning in n-player general-sum games by introducing an uncoupled algorithm that achieves $O\left(n \log^2 d \log T\right)$ per-player regret in self-play. The core method, Dynamic Learning Rate Control Optimistic MWU (DLRC-OMWU), blends Optimistic MWU with a per-step learning-rate optimization that slows learning when a player's maximum regret becomes overly negative, yielding a nonmonotone yet stable pacing mechanism. The authors develop a novel regularizer $\psi$ with strong spectral properties and show multiple equivalent formulations (OFTRL/FTRL) that underpin the regret analysis, including a kernelized extension (KDLRC-OMWU) that handles 0/1-polyhedral games. Across both self-play and adversarial settings, the framework delivers fast convergence to coarse correlated equilibria and provides anytime guarantees without horizon-doubling tricks, marking a substantial improvement over prior polylogarithmic-in-$T$ bounds. The results have broad implications for fast, uncoupled learning in strategic environments and open avenues for applying dynamic pacing to other FTRL/OMWU-based methods.

Abstract

We establish the first uncoupled learning algorithm that attains $O(n \log^2 d \log T)$ per-player regret in multi-player general-sum games, where $n$ is the number of players, $d$ is the number of actions available to each player, and $T$ is the number of repetitions of the game. Our results exponentially improve the dependence on $d$ compared to the $O(n\, d \log T)$ regret attainable by Log-Regularized Lifted Optimistic FTRL [Far+22c], and also reduce the dependence on the number of iterations $T$ from $\log^4 T$ to $\log T$ compared to Optimistic Hedge, the previously well-studied algorithm with $O(n \log d \log^4 T)$ regret [DFG21]. Our algorithm is obtained by combining the classic Optimistic Multiplicative Weights Update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players, making them more cautious when their regret becomes too negative.

Faster Rates for No-Regret Learning in General Games via Cautious Optimism

TL;DR

This work tackles the problem of no-regret learning in n-player general-sum games by introducing an uncoupled algorithm that achieves

per-player regret in self-play. The core method, Dynamic Learning Rate Control Optimistic MWU (DLRC-OMWU), blends Optimistic MWU with a per-step learning-rate optimization that slows learning when a player's maximum regret becomes overly negative, yielding a nonmonotone yet stable pacing mechanism. The authors develop a novel regularizer

with strong spectral properties and show multiple equivalent formulations (OFTRL/FTRL) that underpin the regret analysis, including a kernelized extension (KDLRC-OMWU) that handles 0/1-polyhedral games. Across both self-play and adversarial settings, the framework delivers fast convergence to coarse correlated equilibria and provides anytime guarantees without horizon-doubling tricks, marking a substantial improvement over prior polylogarithmic-in-

bounds. The results have broad implications for fast, uncoupled learning in strategic environments and open avenues for applying dynamic pacing to other FTRL/OMWU-based methods.

Abstract

We establish the first uncoupled learning algorithm that attains

per-player regret in multi-player general-sum games, where

is the number of players,

is the number of actions available to each player, and

is the number of repetitions of the game. Our results exponentially improve the dependence on

compared to the

regret attainable by Log-Regularized Lifted Optimistic FTRL [Far+22c], and also reduce the dependence on the number of iterations

from

compared to Optimistic Hedge, the previously well-studied algorithm with

regret [DFG21]. Our algorithm is obtained by combining the classic Optimistic Multiplicative Weights Update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players, making them more cautious when their regret becomes too negative.

Faster Rates for No-Regret Learning in General Games via Cautious Optimism

TL;DR

Abstract

Faster Rates for No-Regret Learning in General Games via Cautious Optimism

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (72)