Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games
Ashkan Soleymani, Georgios Piliouras, Gabriele Farina
TL;DR
Cautious Optimism introduces COFTRL, a universal meta-framework that converts any no-regret FTRL instance into an accelerated, uncoupled algorithm by non-monotone, data-driven learning-rate pacing. The approach yields near-constant social regret while achieving logarithmic-time regret in self-play across general games, including convex settings, with only modest overhead. Central to the method are intrinsic Lipschitz regularizers, a lifted OFTRL interpretation, and a dynamic learning-rate control problem that is strongly concave and often self-concordant, enabling efficient computation of learning-rate updates. The paper develops multiple instantiations (COMWU, log-regularizer-based, and $q^*$-Tsallis variants) that improve the state-of-the-art regret bounds, and extends the framework to kernelized, convex, and 0/1-polyhedral games. Overall, COFTRL provides a principled, scalable path to fast learning in games with robust, uncoupled guarantees and broad applicability.
Abstract
We introduce Cautious Optimism, a framework for substantially faster regularized learning in general games. Cautious Optimism, as a variant of Optimism, adaptively controls the learning pace in a dynamic, non-monotone manner to accelerate no-regret learning dynamics. Cautious Optimism takes as input any instance of Follow-the-Regularized-Leader (FTRL) and outputs an accelerated no-regret learning algorithm (COFTRL) by pacing the underlying FTRL with minimal computational overhead. Importantly, it retains uncoupledness, that is, learners do not need to know other players' utilities. Cautious Optimistic FTRL (COFTRL) achieves near-optimal $O_T(\log T)$ regret in diverse self-play (mixing and matching regularizers) while preserving the optimal $O_T(\sqrt{T})$ regret in adversarial scenarios. In contrast to prior works (e.g., Syrgkanis et al. [2015], Daskalakis et al. [2021]), our analysis does not rely on monotonic step sizes, showcasing a novel route for fast learning in general games. Moreover, instances of COFTRL achieve new state-of-the-art regret minimization guarantees in general convex games, exponentially improving the dependence on the dimension of the action space $d$ over previous works [Farina et al., 2022a].
