Table of Contents
Fetching ...

The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization

Constantinos Daskalakis, Ioannis Panageas

TL;DR

This work analyzes the last-iterate convergence of first-order min–max dynamics, focusing on Gradient Descent/Ascent (GDA) and Optimistic Gradient Descent/Ascent (OGDA) in unconstrained settings. Using a dynamical-systems framework and center-stable manifold theory, it shows unstable fixed points are avoided for almost all initializations and proves a hierarchy of stable points: Local min–max ⊆ GDA-stable ⊆ OGDA-stable (with strict inclusions in general) for small step sizes. A key contribution is relating the OGDA and GDA Jacobians, demonstrating that OGDA stability subsumes GDA stability via a quadratic eigenvalue relation, and providing explicit examples where OGDA stabilizes points that GDA cannot. The paper strengthens the understanding of last-iterate behavior in min–max optimization and offers practical insights into training GANs and similar systems, backed by 2D and higher-dimensional experiments.

Abstract

Motivated by applications in Optimization, Game Theory, and the training of Generative Adversarial Networks, the convergence properties of first order methods in min-max problems have received extensive study. It has been recognized that they may cycle, and there is no good understanding of their limit points when they do not. When they converge, do they converge to local min-max solutions? We characterize the limit points of two basic first order methods, namely Gradient Descent/Ascent (GDA) and Optimistic Gradient Descent Ascent (OGDA). We show that both dynamics avoid unstable critical points for almost all initializations. Moreover, for small step sizes and under mild assumptions, the set of \{OGDA\}-stable critical points is a superset of \{GDA\}-stable critical points, which is a superset of local min-max solutions (strict in some cases). The connecting thread is that the behavior of these dynamics can be studied from a dynamical systems perspective.

The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization

TL;DR

This work analyzes the last-iterate convergence of first-order min–max dynamics, focusing on Gradient Descent/Ascent (GDA) and Optimistic Gradient Descent/Ascent (OGDA) in unconstrained settings. Using a dynamical-systems framework and center-stable manifold theory, it shows unstable fixed points are avoided for almost all initializations and proves a hierarchy of stable points: Local min–max ⊆ GDA-stable ⊆ OGDA-stable (with strict inclusions in general) for small step sizes. A key contribution is relating the OGDA and GDA Jacobians, demonstrating that OGDA stability subsumes GDA stability via a quadratic eigenvalue relation, and providing explicit examples where OGDA stabilizes points that GDA cannot. The paper strengthens the understanding of last-iterate behavior in min–max optimization and offers practical insights into training GANs and similar systems, backed by 2D and higher-dimensional experiments.

Abstract

Motivated by applications in Optimization, Game Theory, and the training of Generative Adversarial Networks, the convergence properties of first order methods in min-max problems have received extensive study. It has been recognized that they may cycle, and there is no good understanding of their limit points when they do not. When they converge, do they converge to local min-max solutions? We characterize the limit points of two basic first order methods, namely Gradient Descent/Ascent (GDA) and Optimistic Gradient Descent Ascent (OGDA). We show that both dynamics avoid unstable critical points for almost all initializations. Moreover, for small step sizes and under mild assumptions, the set of \{OGDA\}-stable critical points is a superset of \{GDA\}-stable critical points, which is a superset of local min-max solutions (strict in some cases). The connecting thread is that the behavior of these dynamics can be studied from a dynamical systems perspective.

Paper Structure

This paper contains 20 sections, 16 theorems, 32 equations, 2 figures, 1 table.

Key Result

Proposition 1.4

If the Jacobian of the update rule at a stable fixed point $\mathbf{z}$ has spectral radius less than one, then the fixed point is asymptotically stable. Therefore, if a fixed point $\mathbf{z}$ is hyperbolic, then linear stability implies asymptotic stability.

Figures (2)

  • Figure 1: Function $f(x,y) = -\frac{1}{8}x^2 - \frac{1}{2}y^2 + \frac{6}{10}xy$ and $\alpha = 0.001$. The arrows point towards the next step of the Gradient Descent/Ascent dynamics. We can see that the system converges to $(0,0)$ point (GDA-stable), which is not a local min-max critical point.
  • Figure 2: Construction of a function with points that are GDA-stable and local min-max, GDA-stable and not local min-max and GDA-unstable (and hence not local min-max). The arrows point towards the next step of the Gradient Descent/Ascent dynamics.

Theorems & Definitions (33)

  • Definition 1.1: (Linear) stability
  • Definition 1.2: Lyapunov and Asymptotic Stability
  • Definition 1.3: Hyperbolicity
  • Proposition 1.4: e.g. G07
  • Remark 1.5: Fixed points of GDA, OGDA dynamics
  • Definition 1.6
  • Remark 1.9
  • Theorem 1.10: Inclusion
  • Theorem 1.11: Avoid unstable
  • Lemma 2.1: GDA is a local diffeomorphism
  • ...and 23 more