Bridging GANs and Bayesian Neural Networks via Partial Stochasticity
Maurizio Filippone, Marius P. Linhard
TL;DR
The paper reframes GANs as partially stochastic neural samplers within a Bayesian-like framework, showing that with sufficient latent dimensionality the model is a universal approximator of absolutely continuous distributions. It derives that the intractable marginalized likelihood can be treated as a distribution-matching problem and presents practical proxies using divergences and IPMs, with discriminators as essential components in many formulations. To address overfitting and optimization difficulties, the authors propose regularization techniques (likelihood relaxation, gradient regularization) and strategies to locate flat minima (small batches, SAM) along with approximate inference via dropout. Theoretical results establish universal approximation guarantees for Partially Stochastic Networks, and extensive experiments on benchmarks (MNIST, CIFAR-10, FFHQ, CelebA) demonstrate performance gains in several settings, particularly when regularization and flat-minima strategies are employed. Overall, the work provides a principled bridge between GANs and Bayesian neural networks, offering actionable methods to improve stability, generalization, and understanding of GAN optimization.
Abstract
Generative Adversarial Networks (GANs) are popular and successful generative models. Despite their success, optimization is notoriously challenging. In this work, we explain the success and limitations of GANs by casting them as Bayesian neural networks with partial stochasticity. This interpretation allows us to establish conditions of universal approximation and to rewrite the adversarial-style optimization of several variants of GANs as the optimization of a proxy for the likelihood obtained by marginalizing out the stochastic variables. Following this interpretation, the need for regularization becomes apparent, and we propose to adopt strategies to smooth the loss landscape and methods to search for solutions with minimum description length, which are associated with flat minima and good generalization. Results obtained on a wide range of experiments indicate that these strategies lead to performance improvements and pave the way to a deeper understanding of GANs.
