Scale-Invariant Fast Convergence in Games

Taira Tsuchiya; Haipeng Luo; Shinji Ito

Scale-Invariant Fast Convergence in Games

Taira Tsuchiya, Haipeng Luo, Shinji Ito

TL;DR

The paper tackles fast convergence to game equilibria without prior scale information by designing scale-free, scale-invariant learning dynamics for both two-player zero-sum and multiplayer general-sum games. It leverages optimistic follow-the-regularized-leader with adaptive learning rates that incorporate gradient-path lengths, plus a stopping-time analysis to exploit negative terms in regret bounds without knowing payoff ranges. In the two-player setting, the authors obtain external regrets bounded by the payoff range$A_{ ext{diff}}$, yielding a convergence rate of $O(A_{ ext{diff}} \log m / T)$ to Nash equilibrium; in multiplayer settings, swap regrets are bounded by $O(U_{ ext{max}} \log T)$ per player, yielding $O(U_{ ext{max}} \log T / T)$ convergence to correlated equilibrium. A novel doubling clipping technique enables scale-free, scale-invariant guarantees in general-sum games, and the results hold even under opponent deviations and certain corruption regimes, highlighting practical robustness. Overall, the work advances scalable, tuning-free strategies for equilibrium computation in broad game-theoretic contexts with strong theoretical performance guarantees and practical relevance for autonomous agents in adversarial environments.

Abstract

Scale-invariance in games has recently emerged as a widely valued desirable property. Yet, almost all fast convergence guarantees in learning in games require prior knowledge of the utility scale. To address this, we develop learning dynamics that achieve fast convergence while being both scale-free, requiring no prior information about utilities, and scale-invariant, remaining unchanged under positive rescaling of utilities. For two-player zero-sum games, we obtain scale-free and scale-invariant dynamics with external regret bounded by $\tilde{O}(A_{\mathrm{diff}})$, where $A_{\mathrm{diff}}$ is the payoff range, which implies an $\tilde{O}(A_{\mathrm{diff}} / T)$ convergence rate to Nash equilibrium after $T$ rounds. For multiplayer general-sum games with $n$ players and $m$ actions, we obtain scale-free and scale-invariant dynamics with swap regret bounded by $O(U_{\mathrm{max}} \log T)$, where $U_{\mathrm{max}}$ is the range of the utilities, ignoring the dependence on the number of players and actions. This yields an $O(U_{\mathrm{max}} \log T / T)$ convergence rate to correlated equilibrium. Our learning dynamics are based on optimistic follow-the-regularized-leader with an adaptive learning rate that incorporates the squared path length of the opponents' gradient vectors, together with a new stopping-time analysis that exploits negative terms in regret bounds without scale-dependent tuning. For general-sum games, scale-free learning is enabled also by a technique called doubling clipping, which clips observed gradients based on past observations.

Scale-Invariant Fast Convergence in Games

TL;DR

, yielding a convergence rate of

to Nash equilibrium; in multiplayer settings, swap regrets are bounded by

per player, yielding

convergence to correlated equilibrium. A novel doubling clipping technique enables scale-free, scale-invariant guarantees in general-sum games, and the results hold even under opponent deviations and certain corruption regimes, highlighting practical robustness. Overall, the work advances scalable, tuning-free strategies for equilibrium computation in broad game-theoretic contexts with strong theoretical performance guarantees and practical relevance for autonomous agents in adversarial environments.

Abstract

, where

is the payoff range, which implies an

convergence rate to Nash equilibrium after

rounds. For multiplayer general-sum games with

players and

actions, we obtain scale-free and scale-invariant dynamics with swap regret bounded by

, where

is the range of the utilities, ignoring the dependence on the number of players and actions. This yields an

convergence rate to correlated equilibrium. Our learning dynamics are based on optimistic follow-the-regularized-leader with an adaptive learning rate that incorporates the squared path length of the opponents' gradient vectors, together with a new stopping-time analysis that exploits negative terms in regret bounds without scale-dependent tuning. For general-sum games, scale-free learning is enabled also by a technique called doubling clipping, which clips observed gradients based on past observations.

Paper Structure (47 sections, 32 theorems, 144 equations, 2 tables, 1 algorithm)

This paper contains 47 sections, 32 theorems, 144 equations, 2 tables, 1 algorithm.

Introduction
Contributions of this paper
Two-player zero-sum games
Multiplayer general-sum games
Additional related work
Preliminaries
Notation and conventions
Online linear optimization
Setup
External regret and swap regret
Optimistic follow-the-regularized-leader
Two-player zero-sum games
Multiplayer general-sum games
Scale-invariant and scale-free learning dynamics
Scale-Invariant Learning Dynamics for Two-Player Zero-Sum Games
...and 32 more sections

Key Result

Theorem 1

In two-player zero-sum games with a payoff matrix $A$, there exists scale-free and scale-invariant learning dynamics such that the external regrets of the $x$- and $y$-players are bounded by $A_{\mathrm{diff}} \log m$, where $m$ is the maximum number of actions among the players and $A_{\mathrm{diff

Theorems & Definitions (58)

Theorem 1: Informal version of \ref{['thm:main']}
Theorem 2: Informal version of \ref{['thm:indiv_swapreg']}
Theorem 3: freund99adaptive
Theorem 4: foster97calibrated
Definition 1: Scale-invariant / scale-free online linear optimization
Definition 2: Scale-invariant / scale-free learning dynamics in games
Theorem 5: Scale-invariant fast convergence to Nash equilibrium
Remark 1
Lemma 1
Lemma 2
...and 48 more

Scale-Invariant Fast Convergence in Games

TL;DR

Abstract

Scale-Invariant Fast Convergence in Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (58)