Table of Contents
Fetching ...

Bridging GANs and Bayesian Neural Networks via Partial Stochasticity

Maurizio Filippone, Marius P. Linhard

TL;DR

The paper reframes GANs as partially stochastic neural samplers within a Bayesian-like framework, showing that with sufficient latent dimensionality the model is a universal approximator of absolutely continuous distributions. It derives that the intractable marginalized likelihood can be treated as a distribution-matching problem and presents practical proxies using divergences and IPMs, with discriminators as essential components in many formulations. To address overfitting and optimization difficulties, the authors propose regularization techniques (likelihood relaxation, gradient regularization) and strategies to locate flat minima (small batches, SAM) along with approximate inference via dropout. Theoretical results establish universal approximation guarantees for Partially Stochastic Networks, and extensive experiments on benchmarks (MNIST, CIFAR-10, FFHQ, CelebA) demonstrate performance gains in several settings, particularly when regularization and flat-minima strategies are employed. Overall, the work provides a principled bridge between GANs and Bayesian neural networks, offering actionable methods to improve stability, generalization, and understanding of GAN optimization.

Abstract

Generative Adversarial Networks (GANs) are popular and successful generative models. Despite their success, optimization is notoriously challenging. In this work, we explain the success and limitations of GANs by casting them as Bayesian neural networks with partial stochasticity. This interpretation allows us to establish conditions of universal approximation and to rewrite the adversarial-style optimization of several variants of GANs as the optimization of a proxy for the likelihood obtained by marginalizing out the stochastic variables. Following this interpretation, the need for regularization becomes apparent, and we propose to adopt strategies to smooth the loss landscape and methods to search for solutions with minimum description length, which are associated with flat minima and good generalization. Results obtained on a wide range of experiments indicate that these strategies lead to performance improvements and pave the way to a deeper understanding of GANs.

Bridging GANs and Bayesian Neural Networks via Partial Stochasticity

TL;DR

The paper reframes GANs as partially stochastic neural samplers within a Bayesian-like framework, showing that with sufficient latent dimensionality the model is a universal approximator of absolutely continuous distributions. It derives that the intractable marginalized likelihood can be treated as a distribution-matching problem and presents practical proxies using divergences and IPMs, with discriminators as essential components in many formulations. To address overfitting and optimization difficulties, the authors propose regularization techniques (likelihood relaxation, gradient regularization) and strategies to locate flat minima (small batches, SAM) along with approximate inference via dropout. Theoretical results establish universal approximation guarantees for Partially Stochastic Networks, and extensive experiments on benchmarks (MNIST, CIFAR-10, FFHQ, CelebA) demonstrate performance gains in several settings, particularly when regularization and flat-minima strategies are employed. Overall, the work provides a principled bridge between GANs and Bayesian neural networks, offering actionable methods to improve stability, generalization, and understanding of GAN optimization.

Abstract

Generative Adversarial Networks (GANs) are popular and successful generative models. Despite their success, optimization is notoriously challenging. In this work, we explain the success and limitations of GANs by casting them as Bayesian neural networks with partial stochasticity. This interpretation allows us to establish conditions of universal approximation and to rewrite the adversarial-style optimization of several variants of GANs as the optimization of a proxy for the likelihood obtained by marginalizing out the stochastic variables. Following this interpretation, the need for regularization becomes apparent, and we propose to adopt strategies to smooth the loss landscape and methods to search for solutions with minimum description length, which are associated with flat minima and good generalization. Results obtained on a wide range of experiments indicate that these strategies lead to performance improvements and pave the way to a deeper understanding of GANs.

Paper Structure

This paper contains 33 sections, 1 theorem, 13 equations, 4 figures, 6 tables.

Key Result

Theorem 4.2

(Adapted from Sharma23). Let ${{\boldsymbol{\mathbf{x}}}}$ be a random variable in $\mathcal{X}\subseteq \mathbb{R}^D$ and ${{\boldsymbol{\mathbf{f}}}}_{\mathrm{gen}}(\cdot, {{\boldsymbol{\mathbf{\psi}}}}): \mathbb{R}^P \rightarrow \mathcal{X}$ be a neural network satisfying assumption:uat. Let ${{\

Figures (4)

  • Figure 1: Graphical model representation of various studied in this work. We use the standard convention that nodes denote stochastic random variables, while dots represent deterministic ones. Shaded nodes denote observed variables, and we use the plate notation to indicate that certain random variables have $N$ repetitions. Model a represents a with full stochasticity for generative modeling (e.g., TranNeurIPS21). Model b represents with partial stochasticity, where all stochasticity is captured by ${{\boldsymbol{\mathbf{z}}}}$Sharma23; this is the construction that we use to study of in this work. Model c is the graphical model of Bayesian Saatchi17.
  • Figure 2: on two-dimensional Gaussian data. Details on the experimental setup in the main text.
  • Figure 3: mnist and celeba - uncurated samples generated from the models in \ref{['tab:res:summary']}.
  • Figure 4: Uncurated samples from the baseline and the one with likelihood relaxation on the ffhq256 data.

Theorems & Definitions (1)

  • Theorem 4.2