Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution

Elen Vardanyan; Sona Hunanyan; Tigran Galstyan; Arshak Minasyan; Arnak Dalalyan

Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution

Elen Vardanyan, Sona Hunanyan, Tigran Galstyan, Arshak Minasyan, Arnak Dalalyan

TL;DR

The paper addresses the risk that learned generative models replicate training data by introducing Left-Inverse Penalized ERM (LIPERM), which penalizes the lack of a smooth left inverse in the generator map. By optimizing a penalized objective that combines a distance to the empirical distribution with a left-inverse penalty, the authors derive non-asymptotic lower and upper bounds on Wasserstein and IPM distances, showing that the learned generator can substantially deviate from the empirical distribution while remaining statistically optimal. The core results establish a finite-sample $n^{-1/d}$ rate that is independent of ambient dimension $D$, and they provide minimax-optimality statements and robustness to function-class approximations. Empirical evaluations on Swiss Roll, MNIST, and CIFAR-10 corroborate that LIPERM can produce diverse outputs without noticeable replication and that increasing the left-inverse penalty preserves sample quality, highlighting practical viability and limitations such as computational challenges and latent-space design.

Abstract

This paper explores the problem of generative modeling, aiming to simulate diverse examples from an unknown distribution based on observed examples. While recent studies have focused on quantifying the statistical precision of popular algorithms, there is a lack of mathematical evaluation regarding the non-replication of observed examples and the creativity of the generative model. We present theoretical insights into this aspect, demonstrating that the Wasserstein GAN, constrained to left-invertible push-forward maps, generates distributions that avoid replication and significantly deviate from the empirical distribution. Importantly, we show that left-invertibility achieves this without compromising the statistical optimality of the resulting generator. Our most important contribution provides a finite-sample lower bound on the Wasserstein-1 distance between the generative distribution and the empirical one. We also establish a finite-sample upper bound on the distance between the generative distribution and the true data-generating one. Both bounds are explicit and show the impact of key parameters such as sample size, dimensions of the ambient and latent spaces, noise level, and smoothness measured by the Lipschitz constant.

Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution

TL;DR

rate that is independent of ambient dimension

, and they provide minimax-optimality statements and robustness to function-class approximations. Empirical evaluations on Swiss Roll, MNIST, and CIFAR-10 corroborate that LIPERM can produce diverse outputs without noticeable replication and that increasing the left-inverse penalty preserves sample quality, highlighting practical viability and limitations such as computational challenges and latent-space design.

Abstract

Paper Structure (22 sections, 12 theorems, 73 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 12 theorems, 73 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
Left-Inverse-Penalized Empirical Risk
Main Result: Deviation from the Empirical Distribution
Warm-up: The Case of Hard Constraint ($\lambda=\infty$)
The Case of Soft Constraint: $\lambda\in(0,\infty)$
Precision of Left-Inverse-Penalized ERM
Handling Functional Approximations
Numerical Experiments
Summary, Conclusion and Limitations
Proof of the upper bound on the risk
Proof of \ref{['thm:upper']}
Proofs for the deviation of the generative distribution from the empirical distribution
Lower bounding the distance between the uniform distribution and any discrete distribution
Proof of \ref{['thm:lower-bound-hard']}
Proof of \ref{['thm:lower-bound-pen']}
...and 7 more sections

Key Result

Lemma 1

Any distribution $P_g = g\sharp\,\mathcal{U}_d$ defined by a push-forward map $g\in \mathcal{G}_{\mathcal{H}}$ has no atom. In particular, it satisfies $P_g(\{\boldsymbol{X}_1,\ldots,\boldsymbol{X}_n\}) = 0$.

Figures (9)

Figure 1: Illustration of the framework of this paper: generating points on a 2D spiral using a 1D latent space. The green arrows represent the mapping $g:[0,1]\to \mathbb R^2$. Each arrow indicates how points from the latent space are mapped to positions in the 2D spiral.
Figure 2: Handwritten digits generated by LIPERM, from left to right: $\lambda = 0,1,4,8$. See Fig. \ref{['fig:mnist2']} for higher-resolution images.
Figure 3: LIPERM on MNIST data. The behavior of the generator loss and of the left-inverse penalty across the iterations.
Figure 4: Swiss Roll: samples generated by LIPERM WGAN (blue) and original data (orange) for $\lambda = 0,1,4,8$ (left to right).
Figure 5: Experimental results on CIFAR-10 data set. Left: the evolution of the Inception Score across the iterations. Middle: the evolution of the generator loss across the iterations, for various values of $\lambda$. Right: the evolution of the left-inverse penalty across the iterations, for various values of $\lambda$.
...and 4 more figures

Theorems & Definitions (23)

Lemma 1
proof : Proof of Lemma \ref{['lem:1a']}
Proposition 1
Theorem 1
Corollary 1
Remark 1
Theorem 2
Remark 2
Remark 3
Remark 4
...and 13 more

Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution

TL;DR

Abstract

Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (23)