Adaptively Perturbed Mirror Descent for Learning in Games

Kenshi Abe; Kaito Ariu; Mitsuki Sakamoto; Atsushi Iwasaki

Adaptively Perturbed Mirror Descent for Learning in Games

Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Atsushi Iwasaki

TL;DR

This paper proposes a payoff perturbation technique for the Mirror Descent algorithm, called APMD, which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval to find a Nash equilibrium of the underlying game with guaranteed rates.

Abstract

This paper proposes a payoff perturbation technique for the Mirror Descent (MD) algorithm in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. The optimistic family of learning algorithms, exemplified by optimistic MD, successfully achieves {\it last-iterate} convergence in scenarios devoid of noise, leading the dynamics to a Nash equilibrium. A recent re-emerging trend underscores the promise of the perturbation approach, where payoff functions are perturbed based on the distance from an anchoring, or {\it slingshot}, strategy. In response, we propose {\it Adaptively Perturbed MD} (APMD), which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval. This innovation empowers us to find a Nash equilibrium of the underlying game with guaranteed rates. Empirical demonstrations affirm that our algorithm exhibits significantly accelerated convergence.

Adaptively Perturbed Mirror Descent for Learning in Games

TL;DR

Abstract

Paper Structure (69 sections, 38 theorems, 200 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 69 sections, 38 theorems, 200 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
Preliminaries
Monotone games.
Nash equilibrium and gap function.
Problem setting.
Mirror Descent.
Other notations.
Adaptively Perturbed Mirror Descent
Slingshot Payoff Perturbation
Slingshot Strategy Update
Last-Iterate Convergence Rates
Full Feedback Setting
Proof Sketch of Theorem \ref{['thm:lic_rate_full']}
(1) Convergence rates to a stationary point with $k$-th slingshot strategy profile.
(2) Upper bound on the gap function.
...and 54 more sections

Key Result

Theorem 4.1

If we use the constant learning rate $\eta_t = \eta \in (0, \frac{2\mu}{3\mu^2 + 8L^2})$, and set $D_{\psi}$ and $G$ as the squared $\ell^2$-distance $D_{\psi}(\pi_i, \pi_i') = G(\pi_i, \pi_i') = \|\pi_i - \pi_i'\|^2/2$, and set $T_{\sigma} = \Theta(\ln T)$, then the strategy profile $\pi^T$ updated

Figures (9)

Figure 1: Illustration of the impact of the slingshot strategy updates on the gap function for $\pi^t$ updated by APMD.
Figure 2: The gap function for $\pi^t$ for APMD, MWU, and OMWU with full feedback. The shaded area represents the standard errors. Note that the KL divergence, reverse KL divergence, and squared $\ell^2$-distance are abbreviated to KL, RKL, and L2, respectively.
Figure 3: The gap function for $\pi^t$ for APMD, MWU, and OMWU with noisy feedback.
Figure 4: The gap function for $\pi^t$ for APMD, APFTRL, MWU, OMWU, and OGD with full feedback. The shaded area represents the standard errors. Note that the KL divergence, reverse KL divergence, and squared $\ell^2$-distance are abbreviated to KL, RKL, and L2, respectively.
Figure 5: The gap function for $\pi^t$ for APMD, APFTRL, MWU, OMWU, and OGD with noisy feedback. The shaded area represents the standard errors.
...and 4 more figures

Theorems & Definitions (71)

Example 2.1: Concave-Convex Games
Example 2.2: Zero-Sum Polymatrix Games
Theorem 4.1
Lemma 4.2
Lemma 4.3
Lemma 4.4
Theorem 4.5
Lemma 4.6
Lemma 4.7
Example 5.1: Boltzmann Q-Learning Tuyls2006
...and 61 more

Adaptively Perturbed Mirror Descent for Learning in Games

TL;DR

Abstract

Adaptively Perturbed Mirror Descent for Learning in Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (71)