Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution
Elen Vardanyan, Sona Hunanyan, Tigran Galstyan, Arshak Minasyan, Arnak Dalalyan
TL;DR
The paper addresses the risk that learned generative models replicate training data by introducing Left-Inverse Penalized ERM (LIPERM), which penalizes the lack of a smooth left inverse in the generator map. By optimizing a penalized objective that combines a distance to the empirical distribution with a left-inverse penalty, the authors derive non-asymptotic lower and upper bounds on Wasserstein and IPM distances, showing that the learned generator can substantially deviate from the empirical distribution while remaining statistically optimal. The core results establish a finite-sample $n^{-1/d}$ rate that is independent of ambient dimension $D$, and they provide minimax-optimality statements and robustness to function-class approximations. Empirical evaluations on Swiss Roll, MNIST, and CIFAR-10 corroborate that LIPERM can produce diverse outputs without noticeable replication and that increasing the left-inverse penalty preserves sample quality, highlighting practical viability and limitations such as computational challenges and latent-space design.
Abstract
This paper explores the problem of generative modeling, aiming to simulate diverse examples from an unknown distribution based on observed examples. While recent studies have focused on quantifying the statistical precision of popular algorithms, there is a lack of mathematical evaluation regarding the non-replication of observed examples and the creativity of the generative model. We present theoretical insights into this aspect, demonstrating that the Wasserstein GAN, constrained to left-invertible push-forward maps, generates distributions that avoid replication and significantly deviate from the empirical distribution. Importantly, we show that left-invertibility achieves this without compromising the statistical optimality of the resulting generator. Our most important contribution provides a finite-sample lower bound on the Wasserstein-1 distance between the generative distribution and the empirical one. We also establish a finite-sample upper bound on the distance between the generative distribution and the true data-generating one. Both bounds are explicit and show the impact of key parameters such as sample size, dimensions of the ambient and latent spaces, noise level, and smoothness measured by the Lipschitz constant.
