Wasserstein GANs are Minimax Optimal Distribution Estimators

Arthur Stéphanovitch; Eddie Aamari; Clément Levrard

Wasserstein GANs are Minimax Optimal Distribution Estimators

Arthur Stéphanovitch, Eddie Aamari, Clément Levrard

TL;DR

The paper establishes non-asymptotic, minimax-optimal rates for Wasserstein GAN estimators when the target distribution is a Hölder-smooth pushforward of a latent uniform distribution, achieving rates of $O(n^{-(β+γ)/(2β+d)}\vee n^{-1/2})$ up to log factors under Hölder IPMs. It introduces a sharp interpolation inequality between Hölder IPMs on manifolds, enabling uniform rates across discriminator smoothness $\gamma$ and supporting a tractable GAN estimator in the manifold setting. The work develops a wavelet-based framework to describe regularity and to construct generator and discriminator classes, proving minimax optimality across three models: a general low-dimensional model with theoretical discriminators, a full-dimensional density-based model with tractable discriminators, and a manifold-based model that combines tractability with minimax optimality. Together, these results provide a principled foundation for why WGANs can achieve minimax-optimal distribution estimation in complex, structured data settings. The findings underscore the role of regularity and manifold structure and offer practical pathways to implementable, minimax-achieving GAN estimators via wavelet-inspired neural architectures.

Abstract

We provide non asymptotic rates of convergence of the Wasserstein Generative Adversarial networks (WGAN) estimator. We build neural networks classes representing the generators and discriminators which yield a GAN that achieves the minimax optimal rate for estimating a certain probability measure $μ$ with support in $\mathbb{R}^p$. The probability $μ$ is considered to be the push forward of the Lebesgue measure on the $d$-dimensional torus $\mathbb{T}^d$ by a map $g^\star:\mathbb{T}^d\rightarrow \mathbb{R}^p$ of smoothness $β+1$. Measuring the error with the $γ$-Hölder Integral Probability Metric (IPM), we obtain up to logarithmic factors, the minimax optimal rate $O(n^{-\frac{β+γ}{2β+d}}\vee n^{-\frac{1}{2}})$ where $n$ is the sample size, $β$ determines the smoothness of the target measure $μ$, $γ$ is the smoothness of the IPM ($γ=1$ is the Wasserstein case) and $d\leq p$ is the intrinsic dimension of $μ$. In the process, we derive a sharp interpolation inequality between Hölder IPMs. This novel result of theory of functions spaces generalizes classical interpolation inequalities to the case where the measures involved have densities on different manifolds.

Wasserstein GANs are Minimax Optimal Distribution Estimators

TL;DR

up to log factors under Hölder IPMs. It introduces a sharp interpolation inequality between Hölder IPMs on manifolds, enabling uniform rates across discriminator smoothness

and supporting a tractable GAN estimator in the manifold setting. The work develops a wavelet-based framework to describe regularity and to construct generator and discriminator classes, proving minimax optimality across three models: a general low-dimensional model with theoretical discriminators, a full-dimensional density-based model with tractable discriminators, and a manifold-based model that combines tractability with minimax optimality. Together, these results provide a principled foundation for why WGANs can achieve minimax-optimal distribution estimation in complex, structured data settings. The findings underscore the role of regularity and manifold structure and offer practical pathways to implementable, minimax-achieving GAN estimators via wavelet-inspired neural architectures.

Abstract

with support in

. The probability

is considered to be the push forward of the Lebesgue measure on the

-dimensional torus

by a map

of smoothness

. Measuring the error with the

-Hölder Integral Probability Metric (IPM), we obtain up to logarithmic factors, the minimax optimal rate

where

is the sample size,

determines the smoothness of the target measure

is the smoothness of the IPM (

is the Wasserstein case) and

is the intrinsic dimension of

. In the process, we derive a sharp interpolation inequality between Hölder IPMs. This novel result of theory of functions spaces generalizes classical interpolation inequalities to the case where the measures involved have densities on different manifolds.

Paper Structure (70 sections, 53 theorems, 433 equations, 1 table)

This paper contains 70 sections, 53 theorems, 433 equations, 1 table.

Introduction
Overview of the main results and comparison with other works
Notation
Overview of the main results
A (too) general low dimensional model
A full dimensional density-based model
A non-degenerate manifold model
Comparison with other works
Preliminary tools
Bias-variance trade-off for GANs
A general bound
Expected growth of optimal generators and discriminators classes
Wavelets: a key tool to describe regularity
Wavelets, Besov spaces and regularity trade-off
From GANs to wavelets and back
...and 55 more sections

Key Result

Theorem 3.1

If $g^\star \in \mathcal{H}^{\beta+1}_K(\mathbb{T}^d,\mathbb{R}^p)$ and $\mathcal{G}\subset \mathcal{H}^{\beta+1}_K(\mathbb{T}^d,\mathbb{R}^p)$, $\mathcal{D} \subset \mathcal{H}^{\gamma}_1(B^p(0,K),\mathbb{R})$, then the GAN estimator WGANS verifies

Theorems & Definitions (102)

Definition 2.1
Theorem 3.1
Proposition 3.2
Lemma 3.3
Definition 3.4
Lemma 3.5
Proposition 3.6
Proposition 3.7
Theorem 4.1
Corollary 4.2
...and 92 more

Wasserstein GANs are Minimax Optimal Distribution Estimators

TL;DR

Abstract

Wasserstein GANs are Minimax Optimal Distribution Estimators

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (102)