Table of Contents
Fetching ...

On the $O(1/T)$ Convergence of Alternating Gradient Descent-Ascent in Bilinear Games

Tianlong Nan, Shuvomoy Das Gupta, Garud Iyengar, Christian Kroer

TL;DR

We study AltGDA and SimGDA in two-player zero-sum bilinear games and compare their convergence properties under constraints. We prove that AltGDA with a small constant stepsize achieves an $O(1/T)$ ergodic convergence rate when an interior Nash equilibrium exists, and we establish a local $O(1/T)$ rate for general bilinear games. A Performance Estimation Programming framework based on semidefinite programming is introduced to optimize the stepsize and worst-case rate, indicating potential $O(1/T)$ convergence for finite horizons while SimGDA remains limited to $O(1/ ext{sqrt}(T))$ in similar regimes. Numerical experiments corroborate the theoretical rates and illustrate AltGDA’s practical advantage over SimGDA in constrained minimax problems.

Abstract

We study the alternating gradient descent-ascent (AltGDA) algorithm in two-player zero-sum games. Alternating methods, where players take turns to update their strategies, have long been recognized as simple and practical approaches for learning in games, exhibiting much better numerical performance than their simultaneous counterparts. However, our theoretical understanding of alternating algorithms remains limited, and results are mostly restricted to the unconstrained setting. We show that for two-player zero-sum games that admit an interior Nash equilibrium, AltGDA converges at an $O(1/T)$ ergodic convergence rate when employing a small constant stepsize. This is the first result showing that alternation improves over the simultaneous counterpart of GDA in the constrained setting. For games without an interior equilibrium, we show an $O(1/T)$ local convergence rate with a constant stepsize that is independent of any game-specific constants. In a more general setting, we develop a performance estimation programming (PEP) framework to jointly optimize the AltGDA stepsize along with its worst-case convergence rate. The PEP results indicate that AltGDA may achieve an $O(1/T)$ convergence rate for a finite horizon $T$, whereas its simultaneous counterpart appears limited to an $O(1/\sqrt{T})$ rate.

On the $O(1/T)$ Convergence of Alternating Gradient Descent-Ascent in Bilinear Games

TL;DR

We study AltGDA and SimGDA in two-player zero-sum bilinear games and compare their convergence properties under constraints. We prove that AltGDA with a small constant stepsize achieves an ergodic convergence rate when an interior Nash equilibrium exists, and we establish a local rate for general bilinear games. A Performance Estimation Programming framework based on semidefinite programming is introduced to optimize the stepsize and worst-case rate, indicating potential convergence for finite horizons while SimGDA remains limited to in similar regimes. Numerical experiments corroborate the theoretical rates and illustrate AltGDA’s practical advantage over SimGDA in constrained minimax problems.

Abstract

We study the alternating gradient descent-ascent (AltGDA) algorithm in two-player zero-sum games. Alternating methods, where players take turns to update their strategies, have long been recognized as simple and practical approaches for learning in games, exhibiting much better numerical performance than their simultaneous counterparts. However, our theoretical understanding of alternating algorithms remains limited, and results are mostly restricted to the unconstrained setting. We show that for two-player zero-sum games that admit an interior Nash equilibrium, AltGDA converges at an ergodic convergence rate when employing a small constant stepsize. This is the first result showing that alternation improves over the simultaneous counterpart of GDA in the constrained setting. For games without an interior equilibrium, we show an local convergence rate with a constant stepsize that is independent of any game-specific constants. In a more general setting, we develop a performance estimation programming (PEP) framework to jointly optimize the AltGDA stepsize along with its worst-case convergence rate. The PEP results indicate that AltGDA may achieve an convergence rate for a finite horizon , whereas its simultaneous counterpart appears limited to an rate.

Paper Structure

This paper contains 23 sections, 16 theorems, 117 equations, 8 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

Let $\{ (\bm{x}^t, \bm{y}^t) \}_{t = 0, 1, \ldots }$ be a sequence of iterates generated by alg:altgda with $\eta > 0$. Then, for any $(\bm{x}, \bm{y}) \in \Delta_n \times \Delta_m$, we have where $\phi_{t}(\bm{x}, \bm{y}) := \frac{1}{2}\lVert*\rVert_2{\bm{x}^{t}-\bm{x}}^{2} + \frac{1}{2}\lVert*\rVert_2{\bm{y}^{t}-\bm{y}}^{2} + \eta (\bm{y}^{t})^\top A \bm{x}$ and $\psi_{t}(\bm{x},\bm{y}) := \fra

Figures (8)

  • Figure 1: Optimized stepsizes and corresponding optimized objective values for $T = 5, 6, \ldots, 50$ via PEP. The left plot shows the optimized stepsizes. The optimized objective value in the right plot denotes the worst-case performance measure (i.e., duality gap of the averaged iterates) corresponding to the optimized stepsizes on log scale.
  • Figure 2: Numerical results on the rock-paper-scissor game. From left to right, we show the trajectories of the AltGDA iterates (in ternary plots), the changes in duality gaps, and the evolution of the energy functions.
  • Figure 3: Numerical results on a $3 \times 3$ random matrix instance without an interior NE. The experimental setup is the same as in \ref{['fig:with-interior-NE']}.
  • Figure 4: Numerical performances of AltGDA and SimGDA on $10 \times 20$ synthesized matrix games.
  • Figure 5: Numerical performances of AltGDA and SimGDA on $30 \times 60$ synthesized matrix games.
  • ...and 3 more figures

Theorems & Definitions (31)

  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Lemma 3
  • Lemma 4
  • Theorem 2
  • Lemma 5
  • proof
  • proof : Proof of \ref{['lem:two-bounds']}
  • Lemma 6
  • ...and 21 more