Double Descent and Other Interpolation Phenomena in GANs
Lorenzo Luzi, Yehuda Dar, Richard Baraniuk
TL;DR
This work addresses how overparameterization affects generalization in GANs, focusing on the latent-dimension as the key source of parameterization. It shows that training GANs by minimizing distribution metrics or $f$-divergences yields constant test error across interpolating solutions, while a pseudo-supervised scheme—pairing fabricated latent vectors with real outputs—induces double (and sometimes triple) descent and speeds training. The authors develop multiple pseudo-supervised formulations, establish theoretical properties of the solution sets, and demonstrate substantial empirical gains for linear and nonlinear GANs (including MNIST and CelebA) in faster convergence and improved generalization. This work highlights a practical pathway to leverage overparameterization in unsupervised generative modeling and motivates further exploration of pseudo-supervision in large-scale settings.
Abstract
We study overparameterization in generative adversarial networks (GANs) that can interpolate the training data. We show that overparameterization can improve generalization performance and accelerate the training process. We study the generalization error as a function of latent space dimension and identify two main behaviors, depending on the learning setting. First, we show that overparameterized generative models that learn distributions by minimizing a metric or $f$-divergence do not exhibit double descent in generalization errors; specifically, all the interpolating solutions achieve the same generalization error. Second, we develop a novel pseudo-supervised learning approach for GANs where the training utilizes pairs of fabricated (noise) inputs in conjunction with real output samples. Our pseudo-supervised setting exhibits double descent (and in some cases, triple descent) of generalization errors. We combine pseudo-supervision with overparameterization (i.e., overly large latent space dimension) to accelerate training while matching or even surpassing generalization performance without pseudo-supervision. While our analysis focuses mostly on linear models, we also apply important insights for improving generalization of nonlinear, multilayer GANs.
