Table of Contents
Fetching ...

From Variational to Deterministic Autoencoders

Partha Ghosh, Mehdi S. M. Sajjadi, Antonio Vergari, Michael Black, Bernhard Schölkopf

TL;DR

This work questions the necessity of variational inference for generative modeling and introduces Regularized Autoencoders (RAEs), a deterministic alternative to VAEs that achieves competitive sample quality through explicit decoder regularization. By showing that VAE sampling can be viewed as input-noise injection to a deterministic decoder, RAEs replace stochasticity with regularization and omit the KL term, while adding an ex-post density estimation step to restore a workable generative mechanism for sampling. The authors demonstrate strong results on image datasets and extend the approach to structured data via GrammarRAE, with ex-post density estimation further boosting performance across VAEs, WAEs, and RAEs. The findings suggest a simpler, scalable path to high-quality generative modeling that extends beyond images to structured domains like molecules.

Abstract

Variational Autoencoders (VAEs) provide a theoretically-backed and popular framework for deep generative models. However, learning a VAE from data poses still unanswered theoretical questions and considerable practical challenges. In this work, we propose an alternative framework for generative modeling that is simpler, easier to train, and deterministic, yet has many of the advantages of VAEs. We observe that sampling a stochastic encoder in a Gaussian VAE can be interpreted as simply injecting noise into the input of a deterministic decoder. We investigate how substituting this kind of stochasticity, with other explicit and implicit regularization schemes, can lead to an equally smooth and meaningful latent space without forcing it to conform to an arbitrarily chosen prior. To retrieve a generative mechanism to sample new data, we introduce an ex-post density estimation step that can be readily applied also to existing VAEs, improving their sample quality. We show, in a rigorous empirical study, that the proposed regularized deterministic autoencoders are able to generate samples that are comparable to, or better than, those of VAEs and more powerful alternatives when applied to images as well as to structured data such as molecules. \footnote{An implementation is available at: \url{https://github.com/ParthaEth/Regularized_autoencoders-RAE-}}

From Variational to Deterministic Autoencoders

TL;DR

This work questions the necessity of variational inference for generative modeling and introduces Regularized Autoencoders (RAEs), a deterministic alternative to VAEs that achieves competitive sample quality through explicit decoder regularization. By showing that VAE sampling can be viewed as input-noise injection to a deterministic decoder, RAEs replace stochasticity with regularization and omit the KL term, while adding an ex-post density estimation step to restore a workable generative mechanism for sampling. The authors demonstrate strong results on image datasets and extend the approach to structured data via GrammarRAE, with ex-post density estimation further boosting performance across VAEs, WAEs, and RAEs. The findings suggest a simpler, scalable path to high-quality generative modeling that extends beyond images to structured domains like molecules.

Abstract

Variational Autoencoders (VAEs) provide a theoretically-backed and popular framework for deep generative models. However, learning a VAE from data poses still unanswered theoretical questions and considerable practical challenges. In this work, we propose an alternative framework for generative modeling that is simpler, easier to train, and deterministic, yet has many of the advantages of VAEs. We observe that sampling a stochastic encoder in a Gaussian VAE can be interpreted as simply injecting noise into the input of a deterministic decoder. We investigate how substituting this kind of stochasticity, with other explicit and implicit regularization schemes, can lead to an equally smooth and meaningful latent space without forcing it to conform to an arbitrarily chosen prior. To retrieve a generative mechanism to sample new data, we introduce an ex-post density estimation step that can be readily applied also to existing VAEs, improving their sample quality. We show, in a rigorous empirical study, that the proposed regularized deterministic autoencoders are able to generate samples that are comparable to, or better than, those of VAEs and more powerful alternatives when applied to images as well as to structured data such as molecules. \footnote{An implementation is available at: \url{https://github.com/ParthaEth/Regularized_autoencoders-RAE-}}

Paper Structure

This paper contains 21 sections, 15 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Qualitative evaluation of sample quality for VAEs, WAEs, 2sVAEs, and RAEs on CelebA. RAE provides slightly sharper samples and reconstructions while interpolating smoothly in the latent space. Corresponding qualitative overviews for MNIST and CIFAR-10 are provided in Appendix \ref{['sec:more-pics']}.
  • Figure 2: Generating structured objects by GVAE, CVAE and GRAE. (Upper left) Percentage of valid samples and their average mean score (see text, Section \ref{['sec:GrammarRAE']}). The three best expressions (lower left) and molecules (upper right) and their scores are reported for all models.
  • Figure 3: (Left) Test reconstruction quality for a VAE trained on MNIST with different numbers of samples in the latent space as in Eq. \ref{['eq:vae-sampling-encoder']} measured by FID (lower is better). Larger numbers of Monte-Carlo samples clearly improve training, however, the increased accuracy comes with larger requirements for memory and computation. In practice, the most common choice is therefore $k=1$. (Right) Reconstruction and random sample quality (FID, y-axis, lower is better) of a VAE on MNIST for different trade-offs between $\mathcal{L}_{\mathsf{REC}}$ and $\mathcal{L}_{\mathsf{KL}}$ (x-axis, see Eq. \ref{['eq:elbo-loss']}). Higher weights for $\mathcal{L}_{\mathsf{KL}}$ improve random samples but hurt reconstruction. This is especially noticeable towards the optimality point ($\beta \approx 10^1$). This indicates that enforcing structure in the VAE latent space leads to a penalty in quality.
  • Figure 4: PRD curves of all RAE methods (left), reflects a similar story as FID scores do. RAE-SN seems to perform the best in both precision and recall metric. PRD curves of all traditional VAE variants (middle). Similar to the conclusion predicted by FID scores there are no clear winner. PRD curves for the WAE (with isotropic Gaussian prior) , WAE-GMM model with ex-post density estimation by a 10-component GMM and RAE+SN-GMM (right). This finer grained view shows how the WAE-GMM scores higher recall but lower precision than a RAE+SN-GMM while scoring comparable FID scores. Note that ex-post density estimation greatly boosts the WAE model in both PRD and FID scores.
  • Figure 5: PRD curves of all methods on image data experiments on MNIST. For each plot, we show the PRD curve when applying the fixed or the fitted one by ex-post density estimation (XPDE). XPDE greatly boosts both precision and recall for all models.
  • ...and 7 more figures