A Wasserstein perspective of Vanilla GANs

Lea Kunkel; Mathias Trabs

A Wasserstein perspective of Vanilla GANs

Lea Kunkel, Mathias Trabs

TL;DR

This work reframes Vanilla GANs through a Wasserstein lens by relating the Vanilla GAN distance $\mathsf{V}_{\mathcal{W}}$ to the Wasserstein-1 distance $\mathsf{W}_1$, enabling an oracle inequality that splits error into approximation and statistical components. A key technical advance is a quantitative Hölder-approximation result for ReLU networks, which yields explicit latent-dimension $d^*$–dependent convergence rates: roughly $n^{-{\alpha}/(2d^*)}$ for Vanilla GANs and $n^{-{\alpha}/(d^*)}$ for Wasserstein-type GANs, with $\alpha\in(0,1)$; these rates hold under Hölder constraints on the discriminator and network-discriminator architectures. The paper also provides a rigorous pathway to combine neural-network discriminators with the Wasserstein framework, including finite-sample rates and a numerical example illustrating stability from Lipschitz constraints and the ability to detect lower-dimensional manifolds. Overall, the results illuminate when Vanilla GANs can achieve dimension-reduction–like performance and how discriminator regularity and latent-dimension choices influence convergence, thereby clarifying the theoretical relationship between Vanilla GANs and their Wasserstein counterparts.

Abstract

The empirical success of Generative Adversarial Networks (GANs) caused an increasing interest in theoretical research. The statistical literature is mainly focused on Wasserstein GANs and generalizations thereof, which especially allow for good dimension reduction properties. Statistical results for Vanilla GANs, the original optimization problem, are still rather limited and require assumptions such as smooth activation functions and equal dimensions of the latent space and the ambient space. To bridge this gap, we draw a connection from Vanilla GANs to the Wasserstein distance. By doing so, existing results for Wasserstein GANs can be extended to Vanilla GANs. In particular, we obtain an oracle inequality for Vanilla GANs in Wasserstein distance. The assumptions of this oracle inequality are designed to be satisfied by network architectures commonly used in practice, such as feedforward ReLU networks. By providing a quantitative result for the approximation of a Lipschitz function by a feedforward ReLU network with bounded Hölder norm, we conclude a rate of convergence for Vanilla GANs as well as Wasserstein GANs as estimators of the unknown probability distribution.

A Wasserstein perspective of Vanilla GANs

TL;DR

This work reframes Vanilla GANs through a Wasserstein lens by relating the Vanilla GAN distance

to the Wasserstein-1 distance

, enabling an oracle inequality that splits error into approximation and statistical components. A key technical advance is a quantitative Hölder-approximation result for ReLU networks, which yields explicit latent-dimension

–dependent convergence rates: roughly

for Vanilla GANs and

for Wasserstein-type GANs, with

; these rates hold under Hölder constraints on the discriminator and network-discriminator architectures. The paper also provides a rigorous pathway to combine neural-network discriminators with the Wasserstein framework, including finite-sample rates and a numerical example illustrating stability from Lipschitz constraints and the ability to detect lower-dimensional manifolds. Overall, the results illuminate when Vanilla GANs can achieve dimension-reduction–like performance and how discriminator regularity and latent-dimension choices influence convergence, thereby clarifying the theoretical relationship between Vanilla GANs and their Wasserstein counterparts.

Abstract

Paper Structure (19 sections, 16 theorems, 114 equations, 3 figures)

This paper contains 19 sections, 16 theorems, 114 equations, 3 figures.

Introduction
Our contribution.
Related work.
Outline.
The Vanilla GAN distance
From Vanilla to Wasserstein and back
Oracle inequality for Vanilla GANs in Wasserstein distance
Vanilla GANs with network discriminator
Wasserstein GAN
Numerical illustration
Discussion and limitations
Appendix
Proof for \ref{['prelim']}
Proofs for \ref{['compartibility']}
Proofs for \ref{['lipschitz_vanilla']}
...and 4 more sections

Key Result

Lemma 2.1

Assume that $\mathcal{G}$ is chosen such that a minimum exists. Let $\mathcal{W}$ be symmetric, that is, $W \in \mathcal{W}$ implies $-W\in \mathcal{W}.$ For we have that

Figures (3)

Figure 1: Training of Vanilla GAN with weight clip using $100$ observations (first row) or $1000$ observations (second row). Red dots show $1000$ generated samples, green dots show the observations used for the training. The blue line is the one-dimensional manifold.
Figure 2: Marginal $\mathsf{W}_1$ distance depending on number of observations. Thick line shows the average over $50$ independent runs, ribbons show the first to third quartile.
Figure 3: Percentage of generated samples with euclidean distance to manifold greater than $\varepsilon$ using $1000$ observations and a discriminator with $0.5$ clip. Transparent lines show the individual runs, thick line shows the average over $50$ runs.

Theorems & Definitions (31)

Lemma 2.1
Lemma 2.2
Theorem 3.1
Theorem 3.2
Example 3.3
Theorem 4.1
Corollary 4.2
Theorem 4.3
Theorem 5.1
Theorem 5.2
...and 21 more

A Wasserstein perspective of Vanilla GANs

TL;DR

Abstract

A Wasserstein perspective of Vanilla GANs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (31)