Faster Rates for No-Regret Learning in General Games via Cautious Optimism
Ashkan Soleymani, Georgios Piliouras, Gabriele Farina
TL;DR
This work tackles the problem of no-regret learning in n-player general-sum games by introducing an uncoupled algorithm that achieves $O\left(n \log^2 d \log T\right)$ per-player regret in self-play. The core method, Dynamic Learning Rate Control Optimistic MWU (DLRC-OMWU), blends Optimistic MWU with a per-step learning-rate optimization that slows learning when a player's maximum regret becomes overly negative, yielding a nonmonotone yet stable pacing mechanism. The authors develop a novel regularizer $\psi$ with strong spectral properties and show multiple equivalent formulations (OFTRL/FTRL) that underpin the regret analysis, including a kernelized extension (KDLRC-OMWU) that handles 0/1-polyhedral games. Across both self-play and adversarial settings, the framework delivers fast convergence to coarse correlated equilibria and provides anytime guarantees without horizon-doubling tricks, marking a substantial improvement over prior polylogarithmic-in-$T$ bounds. The results have broad implications for fast, uncoupled learning in strategic environments and open avenues for applying dynamic pacing to other FTRL/OMWU-based methods.
Abstract
We establish the first uncoupled learning algorithm that attains $O(n \log^2 d \log T)$ per-player regret in multi-player general-sum games, where $n$ is the number of players, $d$ is the number of actions available to each player, and $T$ is the number of repetitions of the game. Our results exponentially improve the dependence on $d$ compared to the $O(n\, d \log T)$ regret attainable by Log-Regularized Lifted Optimistic FTRL [Far+22c], and also reduce the dependence on the number of iterations $T$ from $\log^4 T$ to $\log T$ compared to Optimistic Hedge, the previously well-studied algorithm with $O(n \log d \log^4 T)$ regret [DFG21]. Our algorithm is obtained by combining the classic Optimistic Multiplicative Weights Update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players, making them more cautious when their regret becomes too negative.
