Table of Contents
Fetching ...

Gradient descent GAN optimization is locally stable

Vaishnavh Nagarajan, J. Zico Kolter

TL;DR

The paper analyzes the stability of gradient-descent GAN optimization where generator and discriminator are updated simultaneously. It proves local exponential stability around favorable equilibria under curvature-type assumptions and shows that Wasserstein GANs can exhibit non-convergent limit cycles in this setting. To address instability, it introduces a gradient-based regularization term on the discriminator gradient that guarantees local stability across GAN variants and can speed convergence while mitigating mode collapse. Empirical results on synthetic mixtures and MNIST-like data demonstrate improved convergence and mode coverage compared to vanilla and unrolled GAN baselines.

Abstract

Despite the growing prominence of generative adversarial networks (GANs), optimization in GANs is still a poorly understood topic. In this paper, we analyze the "gradient descent" form of GAN optimization i.e., the natural setting where we simultaneously take small gradient steps in both generator and discriminator parameters. We show that even though GAN optimization does not correspond to a convex-concave game (even for simple parameterizations), under proper conditions, equilibrium points of this optimization procedure are still \emph{locally asymptotically stable} for the traditional GAN formulation. On the other hand, we show that the recently proposed Wasserstein GAN can have non-convergent limit cycles near equilibrium. Motivated by this stability analysis, we propose an additional regularization term for gradient descent GAN updates, which \emph{is} able to guarantee local stability for both the WGAN and the traditional GAN, and also shows practical promise in speeding up convergence and addressing mode collapse.

Gradient descent GAN optimization is locally stable

TL;DR

The paper analyzes the stability of gradient-descent GAN optimization where generator and discriminator are updated simultaneously. It proves local exponential stability around favorable equilibria under curvature-type assumptions and shows that Wasserstein GANs can exhibit non-convergent limit cycles in this setting. To address instability, it introduces a gradient-based regularization term on the discriminator gradient that guarantees local stability across GAN variants and can speed convergence while mitigating mode collapse. Empirical results on synthetic mixtures and MNIST-like data demonstrate improved convergence and mode coverage compared to vanilla and unrolled GAN baselines.

Abstract

Despite the growing prominence of generative adversarial networks (GANs), optimization in GANs is still a poorly understood topic. In this paper, we analyze the "gradient descent" form of GAN optimization i.e., the natural setting where we simultaneously take small gradient steps in both generator and discriminator parameters. We show that even though GAN optimization does not correspond to a convex-concave game (even for simple parameterizations), under proper conditions, equilibrium points of this optimization procedure are still \emph{locally asymptotically stable} for the traditional GAN formulation. On the other hand, we show that the recently proposed Wasserstein GAN can have non-convergent limit cycles near equilibrium. Motivated by this stability analysis, we propose an additional regularization term for gradient descent GAN updates, which \emph{is} able to guarantee local stability for both the WGAN and the traditional GAN, and also shows practical promise in speeding up convergence and addressing mode collapse.

Paper Structure

This paper contains 33 sections, 25 theorems, 109 equations, 16 figures.

Key Result

Proposition 3.1

The GAN objective in Equation eq:generic_gan can be a concave-concave objective i.e., concave with respect to both the discriminator and generator parameters, for a large part of the discriminator space, including regions arbitrarily close to the equilibrium.

Figures (16)

  • Figure 2: Gradient regularized (left) and traditional (right) DCGAN architectures on stacked MNIST examples, after 1,4 and 20 epochs.
  • Figure 3: Streamline plots around the equilibrium $(0,1)$ for the conventional GAN (top) and the WGAN (bottom) for $\eta=0$ (vanilla updates) and $\eta =0.25,0.5,1$ (left to right).
  • Figure 5: Illustration of Theorem \ref{['thm:multiple-equilibria']}: $\mathcal{S}$ is the neighborhood within which $\boldsymbol{\mathbf{\theta}}$ converges exponentially to $\boldsymbol{\mathbf{0}}$ to a point on the $\boldsymbol{\mathbf{\gamma}}$-axis which corresponds to an equilibrium. However, all initializations within $\mathcal{S}$ may not preserve the trajectory within $\mathcal{S}$ due to a lack of guarantee on how $\boldsymbol{\mathbf{\gamma}}$ behaves -- as illustrated by the dashed trajectory. We identify a smaller ball within $\mathcal{S}$ such that for any intitialization within that ball, $\boldsymbol{\mathbf{\gamma}}$ is well-behaved and consquently ensures exponential convergence of $\boldsymbol{\mathbf{\theta}}$.
  • Figure : Iteration 0
  • Figure : Epoch 1
  • ...and 11 more figures

Theorems & Definitions (47)

  • Proposition 3.1
  • Theorem 3.1
  • Theorem 3.2
  • Definition A.1: Stability
  • Theorem A.1: Lyapunov function
  • Theorem A.2: Linearization
  • Theorem A.3: Corollary of LaSalle's invariance principle, Corollary 4.1 from khalil1996noninear
  • Theorem A.4
  • proof
  • Lemma C.1
  • ...and 37 more