Diagnosing and Enhancing VAE Models

Bin Dai; David Wipf

Diagnosing and Enhancing VAE Models

Bin Dai, David Wipf

TL;DR

The paper challenges the view that Gaussian encoder/decoder choices inherently limit VAE performance by showing that, in principle, the ground-truth distribution can be recovered under optimality in the $r=d$ regime and that the optimal $r<d$ case yields near-correct manifold mass with non-unique solutions. It then introduces a simple, practical two-stage VAE enhancement that first learns a low-dimensional manifold and then learns the distribution on that manifold, achieving crisp samples and competitive Fréchet Inception Distances (FID) with some GANs under a neutral architecture and without extra hyperparameters. Through theoretical results and extensive experiments on MNIST, Fashion-MNIST, CIFAR-10, and CelebA, the work demonstrates that this two-stage approach reduces the mismatch between the aggregated posterior and a standard Gaussian, yields stable sampling, and remains robust to latent-dimension choices. Altogether, the method provides a principled route to improve VAE-based generative modeling, narrowing the gap to GANs while preserving VAE advantages such as stable training and interpretable inference.

Abstract

Although variational autoencoders (VAEs) represent a widely influential deep generative model, many aspects of the underlying energy function remain poorly understood. In particular, it is commonly believed that Gaussian encoder/decoder assumptions reduce the effectiveness of VAEs in generating realistic samples. In this regard, we rigorously analyze the VAE objective, differentiating situations where this belief is and is not actually true. We then leverage the corresponding insights to develop a simple VAE enhancement that requires no additional hyperparameters or sensitive tuning. Quantitatively, this proposal produces crisp samples and stable FID scores that are actually competitive with a variety of GAN models, all while retaining desirable attributes of the original VAE architecture. A shorter version of this work will appear in the ICLR 2019 conference proceedings (Dai and Wipf, 2019). The code for our model is available at https://github.com/daib13/ TwoStageVAE.

Diagnosing and Enhancing VAE Models

TL;DR

regime and that the optimal

case yields near-correct manifold mass with non-unique solutions. It then introduces a simple, practical two-stage VAE enhancement that first learns a low-dimensional manifold and then learns the distribution on that manifold, achieving crisp samples and competitive Fréchet Inception Distances (FID) with some GANs under a neutral architecture and without extra hyperparameters. Through theoretical results and extensive experiments on MNIST, Fashion-MNIST, CIFAR-10, and CelebA, the work demonstrates that this two-stage approach reduces the mismatch between the aggregated posterior and a standard Gaussian, yields stable sampling, and remains robust to latent-dimension choices. Altogether, the method provides a principled route to improve VAE-based generative modeling, narrowing the gap to GANs while preserving VAE advantages such as stable training and interpretable inference.

Diagnosing and Enhancing VAE Models

TL;DR

Abstract

Diagnosing and Enhancing VAE Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (5)