Table of Contents
Fetching ...

Adaptively Perturbed Mirror Descent for Learning in Games

Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Atsushi Iwasaki

TL;DR

This paper proposes a payoff perturbation technique for the Mirror Descent algorithm, called APMD, which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval to find a Nash equilibrium of the underlying game with guaranteed rates.

Abstract

This paper proposes a payoff perturbation technique for the Mirror Descent (MD) algorithm in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. The optimistic family of learning algorithms, exemplified by optimistic MD, successfully achieves {\it last-iterate} convergence in scenarios devoid of noise, leading the dynamics to a Nash equilibrium. A recent re-emerging trend underscores the promise of the perturbation approach, where payoff functions are perturbed based on the distance from an anchoring, or {\it slingshot}, strategy. In response, we propose {\it Adaptively Perturbed MD} (APMD), which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval. This innovation empowers us to find a Nash equilibrium of the underlying game with guaranteed rates. Empirical demonstrations affirm that our algorithm exhibits significantly accelerated convergence.

Adaptively Perturbed Mirror Descent for Learning in Games

TL;DR

This paper proposes a payoff perturbation technique for the Mirror Descent algorithm, called APMD, which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval to find a Nash equilibrium of the underlying game with guaranteed rates.

Abstract

This paper proposes a payoff perturbation technique for the Mirror Descent (MD) algorithm in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. The optimistic family of learning algorithms, exemplified by optimistic MD, successfully achieves {\it last-iterate} convergence in scenarios devoid of noise, leading the dynamics to a Nash equilibrium. A recent re-emerging trend underscores the promise of the perturbation approach, where payoff functions are perturbed based on the distance from an anchoring, or {\it slingshot}, strategy. In response, we propose {\it Adaptively Perturbed MD} (APMD), which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval. This innovation empowers us to find a Nash equilibrium of the underlying game with guaranteed rates. Empirical demonstrations affirm that our algorithm exhibits significantly accelerated convergence.
Paper Structure (69 sections, 38 theorems, 200 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 69 sections, 38 theorems, 200 equations, 9 figures, 2 tables, 1 algorithm.

Key Result

Theorem 4.1

If we use the constant learning rate $\eta_t = \eta \in (0, \frac{2\mu}{3\mu^2 + 8L^2})$, and set $D_{\psi}$ and $G$ as the squared $\ell^2$-distance $D_{\psi}(\pi_i, \pi_i') = G(\pi_i, \pi_i') = \|\pi_i - \pi_i'\|^2/2$, and set $T_{\sigma} = \Theta(\ln T)$, then the strategy profile $\pi^T$ updated

Figures (9)

  • Figure 1: Illustration of the impact of the slingshot strategy updates on the gap function for $\pi^t$ updated by APMD.
  • Figure 2: The gap function for $\pi^t$ for APMD, MWU, and OMWU with full feedback. The shaded area represents the standard errors. Note that the KL divergence, reverse KL divergence, and squared $\ell^2$-distance are abbreviated to KL, RKL, and L2, respectively.
  • Figure 3: The gap function for $\pi^t$ for APMD, MWU, and OMWU with noisy feedback.
  • Figure 4: The gap function for $\pi^t$ for APMD, APFTRL, MWU, OMWU, and OGD with full feedback. The shaded area represents the standard errors. Note that the KL divergence, reverse KL divergence, and squared $\ell^2$-distance are abbreviated to KL, RKL, and L2, respectively.
  • Figure 5: The gap function for $\pi^t$ for APMD, APFTRL, MWU, OMWU, and OGD with noisy feedback. The shaded area represents the standard errors.
  • ...and 4 more figures

Theorems & Definitions (71)

  • Example 2.1: Concave-Convex Games
  • Example 2.2: Zero-Sum Polymatrix Games
  • Theorem 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Lemma 4.4
  • Theorem 4.5
  • Lemma 4.6
  • Lemma 4.7
  • Example 5.1: Boltzmann Q-Learning Tuyls2006
  • ...and 61 more