Making Method of Moments Great Again? -- How can GANs learn distributions
Yuanzhi Li, Zehao Dou
TL;DR
The paper analyzes the early training dynamics of Wasserstein GANs, arguing that the discriminator acts as a moment-matcher and that matching a small number of low-order moments suffices to learn a broad class of target distributions, including those generated by two-layer neural networks. Leveraging moment-based identification and tensor decomposition (via the CE-Matrix and a 3-generic condition), it proves that a two-layer learner can recover the ground-truth distribution with polynomial sample complexity and provides an algorithmic pathway to do so. Extensions to ReLU discriminators and higher-order activations are provided, along with experimental evidence showing the necessity of higher-order moments for successful learning. The work suggests a principled route to understanding GANs’ capacity to approximate complex distributions beyond local equilibria, with explicit rates and identifiability guarantees. Overall, it connects moment matching, tensor methods, and Wasserstein training to establish theoretical learnability results for structured generator/discriminator pairs.
Abstract
Generative Adversarial Networks (GANs) are widely used models to learn complex real-world distributions. In GANs, the training of the generator usually stops when the discriminator can no longer distinguish the generator's output from the set of training examples. A central question of GANs is that when the training stops, whether the generated distribution is actually close to the target distribution, and how the training process reaches to such configurations efficiently? In this paper, we established a theoretical results towards understanding this generator-discriminator training process. We empirically observe that during the earlier stage of the GANs training, the discriminator is trying to force the generator to match the low degree moments between the generator's output and the target distribution. Moreover, only by matching these empirical moments over polynomially many training examples, we prove that the generator can already learn notable class of distributions, including those that can be generated by two-layer neural networks.
