Table of Contents
Fetching ...

VAE with a VampPrior

Jakub M. Tomczak, Max Welling

TL;DR

This work identifies the prior in variational auto-encoders as a bottleneck for learning rich latent representations and proposes the VampPrior, a variational mixture of posteriors built from learnable pseudo-inputs. Extending this idea, the authors introduce a two-layer hierarchical VAE where the second-layer prior is VampPrior, addressing the common issue of inactive latent variables and enabling more powerful latent coding. Empirical results across six image datasets show consistent improvements over standard priors, achieving state-of-the-art or competitive performance with both MLP and convolutional decoders, including PixelCNN-based variants. The approach also connects to Empirical Bayes and Information Bottleneck perspectives, and qualitative analyses reveal sharper generations and informative pseudo-input prototypes, highlighting the method’s practical impact for unsupervised representation learning.

Abstract

Many different methods to train deep generative models have been introduced in the past. In this paper, we propose to extend the variational auto-encoder (VAE) framework with a new type of prior which we call "Variational Mixture of Posteriors" prior, or VampPrior for short. The VampPrior consists of a mixture distribution (e.g., a mixture of Gaussians) with components given by variational posteriors conditioned on learnable pseudo-inputs. We further extend this prior to a two layer hierarchical model and show that this architecture with a coupled prior and posterior, learns significantly better models. The model also avoids the usual local optima issues related to useless latent dimensions that plague VAEs. We provide empirical studies on six datasets, namely, static and binary MNIST, OMNIGLOT, Caltech 101 Silhouettes, Frey Faces and Histopathology patches, and show that applying the hierarchical VampPrior delivers state-of-the-art results on all datasets in the unsupervised permutation invariant setting and the best results or comparable to SOTA methods for the approach with convolutional networks.

VAE with a VampPrior

TL;DR

This work identifies the prior in variational auto-encoders as a bottleneck for learning rich latent representations and proposes the VampPrior, a variational mixture of posteriors built from learnable pseudo-inputs. Extending this idea, the authors introduce a two-layer hierarchical VAE where the second-layer prior is VampPrior, addressing the common issue of inactive latent variables and enabling more powerful latent coding. Empirical results across six image datasets show consistent improvements over standard priors, achieving state-of-the-art or competitive performance with both MLP and convolutional decoders, including PixelCNN-based variants. The approach also connects to Empirical Bayes and Information Bottleneck perspectives, and qualitative analyses reveal sharper generations and informative pseudo-input prototypes, highlighting the method’s practical impact for unsupervised representation learning.

Abstract

Many different methods to train deep generative models have been introduced in the past. In this paper, we propose to extend the variational auto-encoder (VAE) framework with a new type of prior which we call "Variational Mixture of Posteriors" prior, or VampPrior for short. The VampPrior consists of a mixture distribution (e.g., a mixture of Gaussians) with components given by variational posteriors conditioned on learnable pseudo-inputs. We further extend this prior to a two layer hierarchical model and show that this architecture with a coupled prior and posterior, learns significantly better models. The model also avoids the usual local optima issues related to useless latent dimensions that plague VAEs. We provide empirical studies on six datasets, namely, static and binary MNIST, OMNIGLOT, Caltech 101 Silhouettes, Frey Faces and Histopathology patches, and show that applying the hierarchical VampPrior delivers state-of-the-art results on all datasets in the unsupervised permutation invariant setting and the best results or comparable to SOTA methods for the approach with convolutional networks.

Paper Structure

This paper contains 25 sections, 16 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Stochastical dependencies in: (a) a one-layered VAE and (b) a two-layered model. The generative part is denoted by the solid line and the variational part is denoted by the dashed line.
  • Figure 2: A comparison between the HVAE ($L=2$) with SG prior, MoG prior and VampPrior in terms of ELBO and varying number of pseudo-inputs/components on static MNIST.
  • Figure 3: A comparison between two-level VAE and IWAE with the standard normal prior and theirs VampPrior counterpart in terms of number of active units for varying number of pseudo-inputs on static MNIST.
  • Figure 4: (top row) Images generated by PixelHVAE + VampPrior for chosen pseudo-input in the left top corner. (bottom row) Images represent a subset of trained pseudo-inputs for different datasets.
  • Figure 5: (a) Real images from test sets and images generated by (b) the vanilla VAE, (c) the HVAE ($L=2$) + VampPrior, (d) the convHVAE ($L=2$) + VampPrior and (e) the PixelHVAE ($L=2$) + VampPrior.
  • ...and 2 more figures