Diagnosing and Enhancing VAE Models
Bin Dai, David Wipf
TL;DR
The paper challenges the view that Gaussian encoder/decoder choices inherently limit VAE performance by showing that, in principle, the ground-truth distribution can be recovered under optimality in the $r=d$ regime and that the optimal $r<d$ case yields near-correct manifold mass with non-unique solutions. It then introduces a simple, practical two-stage VAE enhancement that first learns a low-dimensional manifold and then learns the distribution on that manifold, achieving crisp samples and competitive Fréchet Inception Distances (FID) with some GANs under a neutral architecture and without extra hyperparameters. Through theoretical results and extensive experiments on MNIST, Fashion-MNIST, CIFAR-10, and CelebA, the work demonstrates that this two-stage approach reduces the mismatch between the aggregated posterior and a standard Gaussian, yields stable sampling, and remains robust to latent-dimension choices. Altogether, the method provides a principled route to improve VAE-based generative modeling, narrowing the gap to GANs while preserving VAE advantages such as stable training and interpretable inference.
Abstract
Although variational autoencoders (VAEs) represent a widely influential deep generative model, many aspects of the underlying energy function remain poorly understood. In particular, it is commonly believed that Gaussian encoder/decoder assumptions reduce the effectiveness of VAEs in generating realistic samples. In this regard, we rigorously analyze the VAE objective, differentiating situations where this belief is and is not actually true. We then leverage the corresponding insights to develop a simple VAE enhancement that requires no additional hyperparameters or sensitive tuning. Quantitatively, this proposal produces crisp samples and stable FID scores that are actually competitive with a variety of GAN models, all while retaining desirable attributes of the original VAE architecture. A shorter version of this work will appear in the ICLR 2019 conference proceedings (Dai and Wipf, 2019). The code for our model is available at https://github.com/daib13/ TwoStageVAE.
