Table of Contents
Fetching ...

A spin-glass model for the loss surfaces of generative adversarial networks

Nicholas P Baskerville, Jonathan P Keating, Francesco Mezzadri, Joseph Najnudel

TL;DR

The authors propose an interacting spin-glass model for GAN loss landscapes by coupling two spherical spin glasses representing the generator and discriminator. They develop a rigorous analysis of the joint complexity and the limiting spectrum of a corresponding Hessian ensemble using Kac-Rice formulae and Random Matrix Theory with a supersymmetric approach, complemented by a Coulomb gas approximation to obtain the asymptotic complexity. Extensions to Hessian-index constrained complexity reveal a two-dimensional banded structure of critical points, offering a qualitative explanation for gradient-descent dynamics and common training outcomes in GANs. Empirical results, including comparisons with DCGAN experiments on CIFAR-10, support the qualitative predictions and demonstrate the potential of physics-inspired models to inform GAN hyperparameter choices and architectural understanding.

Abstract

We present a novel mathematical model that seeks to capture the key design feature of generative adversarial networks (GANs). Our model consists of two interacting spin glasses, and we conduct an extensive theoretical analysis of the complexity of the model's critical points using techniques from Random Matrix Theory. The result is insights into the loss surfaces of large GANs that build upon prior insights for simpler networks, but also reveal new structure unique to this setting.

A spin-glass model for the loss surfaces of generative adversarial networks

TL;DR

The authors propose an interacting spin-glass model for GAN loss landscapes by coupling two spherical spin glasses representing the generator and discriminator. They develop a rigorous analysis of the joint complexity and the limiting spectrum of a corresponding Hessian ensemble using Kac-Rice formulae and Random Matrix Theory with a supersymmetric approach, complemented by a Coulomb gas approximation to obtain the asymptotic complexity. Extensions to Hessian-index constrained complexity reveal a two-dimensional banded structure of critical points, offering a qualitative explanation for gradient-descent dynamics and common training outcomes in GANs. Empirical results, including comparisons with DCGAN experiments on CIFAR-10, support the qualitative predictions and demonstrate the potential of physics-inspired models to inform GAN hyperparameter choices and architectural understanding.

Abstract

We present a novel mathematical model that seeks to capture the key design feature of generative adversarial networks (GANs). Our model consists of two interacting spin glasses, and we conduct an extensive theoretical analysis of the complexity of the model's critical points using techniques from Random Matrix Theory. The result is insights into the loss surfaces of large GANs that build upon prior insights for simpler networks, but also reveal new structure unique to this setting.

Paper Structure

This paper contains 13 sections, 6 theorems, 128 equations, 9 figures.

Key Result

Theorem 3.1

Let $\mathcal{M}$ be a compact , oriented, N-dimensional $C^1$ manifold with a $C^1$ Riemannian metric $g$. Let $\phi:\mathcal{M}\rightarrow\mathbb{R}^N$ and $\psi:\mathcal{M}\rightarrow \mathbb{R}^K$ be random fields on $\mathcal{M}$. For an open set $A\subset\mathbb{R}^K$ for which $\partial A$ ha Assume that the following conditions are satisfied for some orthonormal frame field E: Then where

Figures (9)

  • Figure 1: Example spectra of $H'$ showing empirical spectra from 100 $300\times 300$ matrices and the corresponding LSDs computed from (\ref{['eq:master_quartic']}). Here $b=b_1=1$, $\kappa=0.9$, $\sigma_z$=1 and $x_1$ is varied to give the three different behaviours.
  • Figure 2: $\Phi$ for $p=q=3, \sigma_z=1, \kappa=0.9$. Red lines show the boundary of the integration region $B$.
  • Figure 3: $\Theta$ and its cross-sections, fixing separately $u_D$ and $u_G$. Here $p=q=3, \sigma_z=1, \kappa=0.9$.
  • Figure 4: Comparison of (\ref{['eq:quadrature']}) and (\ref{['eq:mc_det']}), verifying the Coulomb gas approximation numerically. Here $p=q=3, \sigma_z=1, \kappa=0.9$. Sampled matrices for MC approximation are dimension $N=50$, and $n=50$ MC samples have been used.
  • Figure 5: Contour plots of $\Theta_{k_D, k_G}$ for a few values of $k_D, k_G$. Here $p=q=3, \sigma_z=1, \kappa=0.9$.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Theorem 3.1: adler2009random Theorem 12.1.1
  • Lemma 3.2
  • proof
  • Lemma 3.3
  • proof
  • Lemma 5.1
  • proof
  • Lemma 5.2
  • proof
  • Theorem 5.3
  • ...and 1 more