Table of Contents
Fetching ...

From optimal transport to generative modeling: the VEGAN cookbook

Olivier Bousquet, Sylvain Gelly, Ilya Tolstikhin, Carl-Johann Simon-Gabriel, Bernhard Schoelkopf

TL;DR

The paper reframes unsupervised generative modeling as an optimal transport problem between the true data distribution $P_X$ and a latent-variable model $P_G$, deriving a primal formulation that passes through a probabilistic encoder $Q(Z|X)$ and a (potentially random) decoder $P_G(Y|Z)$. By penalizing deviations from the prior $P_Z$ via a POT objective, it becomes amenable to SGD using samples from $P_X$ and $P_G$, and it is shown that for squared Euclidean cost this reduces to the Adversarial Auto-Encoder objective, while for the 1-Wasserstein case it aligns with WGAN duality. The work links POT to VAEs and AVB, explains the blurriness of VAE-generated images, and establishes a spectrum of connections across GANs, VAEs, and AAEs under an OT framework, culminating in the VEGAN cookbook perspective. Practically, the POT framework provides a scalable recipe to blend the strengths of VAEs and GANs, offering theoretical insights and a unified objective for training high-quality generative models. Overall, the paper advances a principled, OT-based foundation for linking and constraining latent-variable generative models with tractable, gradient-based optimization.

Abstract

We study unsupervised generative modeling in terms of the optimal transport (OT) problem between true (but unknown) data distribution $P_X$ and the latent variable model distribution $P_G$. We show that the OT problem can be equivalently written in terms of probabilistic encoders, which are constrained to match the posterior and prior distributions over the latent space. When relaxed, this constrained optimization problem leads to a penalized optimal transport (POT) objective, which can be efficiently minimized using stochastic gradient descent by sampling from $P_X$ and $P_G$. We show that POT for the 2-Wasserstein distance coincides with the objective heuristically employed in adversarial auto-encoders (AAE) (Makhzani et al., 2016), which provides the first theoretical justification for AAEs known to the authors. We also compare POT to other popular techniques like variational auto-encoders (VAE) (Kingma and Welling, 2014). Our theoretical results include (a) a better understanding of the commonly observed blurriness of images generated by VAEs, and (b) establishing duality between Wasserstein GAN (Arjovsky and Bottou, 2017) and POT for the 1-Wasserstein distance.

From optimal transport to generative modeling: the VEGAN cookbook

TL;DR

The paper reframes unsupervised generative modeling as an optimal transport problem between the true data distribution and a latent-variable model , deriving a primal formulation that passes through a probabilistic encoder and a (potentially random) decoder . By penalizing deviations from the prior via a POT objective, it becomes amenable to SGD using samples from and , and it is shown that for squared Euclidean cost this reduces to the Adversarial Auto-Encoder objective, while for the 1-Wasserstein case it aligns with WGAN duality. The work links POT to VAEs and AVB, explains the blurriness of VAE-generated images, and establishes a spectrum of connections across GANs, VAEs, and AAEs under an OT framework, culminating in the VEGAN cookbook perspective. Practically, the POT framework provides a scalable recipe to blend the strengths of VAEs and GANs, offering theoretical insights and a unified objective for training high-quality generative models. Overall, the paper advances a principled, OT-based foundation for linking and constraining latent-variable generative models with tractable, gradient-based optimization.

Abstract

We study unsupervised generative modeling in terms of the optimal transport (OT) problem between true (but unknown) data distribution and the latent variable model distribution . We show that the OT problem can be equivalently written in terms of probabilistic encoders, which are constrained to match the posterior and prior distributions over the latent space. When relaxed, this constrained optimization problem leads to a penalized optimal transport (POT) objective, which can be efficiently minimized using stochastic gradient descent by sampling from and . We show that POT for the 2-Wasserstein distance coincides with the objective heuristically employed in adversarial auto-encoders (AAE) (Makhzani et al., 2016), which provides the first theoretical justification for AAEs known to the authors. We also compare POT to other popular techniques like variational auto-encoders (VAE) (Kingma and Welling, 2014). Our theoretical results include (a) a better understanding of the commonly observed blurriness of images generated by VAEs, and (b) establishing duality between Wasserstein GAN (Arjovsky and Bottou, 2017) and POT for the 1-Wasserstein distance.

Paper Structure

This paper contains 26 sections, 8 theorems, 30 equations, 1 figure.

Key Result

Theorem 1

If $P_G(Y|Z=z)=\delta_{G(z)}$ for all $z\in \mathcal{Z}$, where $G\colon \mathcal{Z}\to\mathcal{X}$, we have where $Q_Z$ is the marginal distribution of $Z$ when $X\sim P_X$ and $Z\sim Q(Z|X)$.

Figures (1)

  • Figure 1: Different behaviours of generative models. The top half represents the latent space $\mathcal{Z}$ with codes (triangles) sampled from $P_Z$. The bottom half represents the data space $\mathcal{X}$, with true data points $X$ (circles) and generated ones $Y$ (squares). The arrows represent the conditional distributions. Generally these are not one to one mappings, but for improved readability we show only one or two arrows to the most likely points. On the left figure, describing VAE KW14 and AVB MNG17, $\Gamma_{\mathrm{VAE}}(Y|Z)$ is a composite of the encoder $Q_{\mathrm{VAE}}(Z|X)$ and the decoder $P_G(Y|Z)$, mapping each true data point $X$ to a distribution on generated points $Y$. For a fixed decoder, the optimal encoder $Q^*_{\mathrm{VAE}}$ will assign mass proportionally to the distance between $Y$ and $X$ and the probability $P_Z(Z)$ (see Eq. \ref{['eq:vae-opt-q']}). We see how different points $X$ are mapped with high probability to the same $Y$, while the other generated points $Y$ are reached only with low probabilities. On the right figure, the OT is expressed as a conditional mapping $\Gamma_{\mathrm{OT}}(Y | X)$. One of our main results (Theorem \ref{['thm:main']}) shows that this mapping can be reparametrized via transport $X\to Z \to Y$, making explicit a role of the encoder $Q_{\mathrm{OT}}(Z|X)$.

Theorems & Definitions (14)

  • Theorem 1
  • Remark 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • proof
  • Proposition 5
  • proof
  • Lemma 6
  • proof
  • ...and 4 more