Scale-Invariant Fast Convergence in Games
Taira Tsuchiya, Haipeng Luo, Shinji Ito
TL;DR
The paper tackles fast convergence to game equilibria without prior scale information by designing scale-free, scale-invariant learning dynamics for both two-player zero-sum and multiplayer general-sum games. It leverages optimistic follow-the-regularized-leader with adaptive learning rates that incorporate gradient-path lengths, plus a stopping-time analysis to exploit negative terms in regret bounds without knowing payoff ranges. In the two-player setting, the authors obtain external regrets bounded by the payoff range$A_{ ext{diff}}$, yielding a convergence rate of $O(A_{ ext{diff}} \log m / T)$ to Nash equilibrium; in multiplayer settings, swap regrets are bounded by $O(U_{ ext{max}} \log T)$ per player, yielding $O(U_{ ext{max}} \log T / T)$ convergence to correlated equilibrium. A novel doubling clipping technique enables scale-free, scale-invariant guarantees in general-sum games, and the results hold even under opponent deviations and certain corruption regimes, highlighting practical robustness. Overall, the work advances scalable, tuning-free strategies for equilibrium computation in broad game-theoretic contexts with strong theoretical performance guarantees and practical relevance for autonomous agents in adversarial environments.
Abstract
Scale-invariance in games has recently emerged as a widely valued desirable property. Yet, almost all fast convergence guarantees in learning in games require prior knowledge of the utility scale. To address this, we develop learning dynamics that achieve fast convergence while being both scale-free, requiring no prior information about utilities, and scale-invariant, remaining unchanged under positive rescaling of utilities. For two-player zero-sum games, we obtain scale-free and scale-invariant dynamics with external regret bounded by $\tilde{O}(A_{\mathrm{diff}})$, where $A_{\mathrm{diff}}$ is the payoff range, which implies an $\tilde{O}(A_{\mathrm{diff}} / T)$ convergence rate to Nash equilibrium after $T$ rounds. For multiplayer general-sum games with $n$ players and $m$ actions, we obtain scale-free and scale-invariant dynamics with swap regret bounded by $O(U_{\mathrm{max}} \log T)$, where $U_{\mathrm{max}}$ is the range of the utilities, ignoring the dependence on the number of players and actions. This yields an $O(U_{\mathrm{max}} \log T / T)$ convergence rate to correlated equilibrium. Our learning dynamics are based on optimistic follow-the-regularized-leader with an adaptive learning rate that incorporates the squared path length of the opponents' gradient vectors, together with a new stopping-time analysis that exploits negative terms in regret bounds without scale-dependent tuning. For general-sum games, scale-free learning is enabled also by a technique called doubling clipping, which clips observed gradients based on past observations.
