Table of Contents
Fetching ...

BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling

Lars Maaløe, Marco Fraccaro, Valentin Liévin, Ole Winther

TL;DR

BIVA introduces a very deep hierarchical latent-variable model that uses a deterministic top-down pathway and a bidirectional bottom-up/top-down inference network to enable rich posterior covariances and robust information flow. The approach yields state-of-the-art likelihoods on several benchmarks, produces sharp natural-image samples, and supports anomaly detection and semi-supervised classification. Through extensive ablations and diverse experiments, the work demonstrates that deep latent hierarchies with skip connections can match or exceed non-autoregressive methods and close the gap with autoregressive/flow-based models. This highlights the practical value of structured latent representations for high-quality generation and reliable anomaly detection in complex data distributions.

Abstract

With the introduction of the variational autoencoder (VAE), probabilistic latent variable models have received renewed attention as powerful generative models. However, their performance in terms of test likelihood and quality of generated samples has been surpassed by autoregressive models without stochastic units. Furthermore, flow-based models have recently been shown to be an attractive alternative that scales well to high-dimensional data. In this paper we close the performance gap by constructing VAE models that can effectively utilize a deep hierarchy of stochastic variables and model complex covariance structures. We introduce the Bidirectional-Inference Variational Autoencoder (BIVA), characterized by a skip-connected generative model and an inference network formed by a bidirectional stochastic inference path. We show that BIVA reaches state-of-the-art test likelihoods, generates sharp and coherent natural images, and uses the hierarchy of latent variables to capture different aspects of the data distribution. We observe that BIVA, in contrast to recent results, can be used for anomaly detection. We attribute this to the hierarchy of latent variables which is able to extract high-level semantic features. Finally, we extend BIVA to semi-supervised classification tasks and show that it performs comparably to state-of-the-art results by generative adversarial networks.

BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling

TL;DR

BIVA introduces a very deep hierarchical latent-variable model that uses a deterministic top-down pathway and a bidirectional bottom-up/top-down inference network to enable rich posterior covariances and robust information flow. The approach yields state-of-the-art likelihoods on several benchmarks, produces sharp natural-image samples, and supports anomaly detection and semi-supervised classification. Through extensive ablations and diverse experiments, the work demonstrates that deep latent hierarchies with skip connections can match or exceed non-autoregressive methods and close the gap with autoregressive/flow-based models. This highlights the practical value of structured latent representations for high-quality generation and reliable anomaly detection in complex data distributions.

Abstract

With the introduction of the variational autoencoder (VAE), probabilistic latent variable models have received renewed attention as powerful generative models. However, their performance in terms of test likelihood and quality of generated samples has been surpassed by autoregressive models without stochastic units. Furthermore, flow-based models have recently been shown to be an attractive alternative that scales well to high-dimensional data. In this paper we close the performance gap by constructing VAE models that can effectively utilize a deep hierarchy of stochastic variables and model complex covariance structures. We introduce the Bidirectional-Inference Variational Autoencoder (BIVA), characterized by a skip-connected generative model and an inference network formed by a bidirectional stochastic inference path. We show that BIVA reaches state-of-the-art test likelihoods, generates sharp and coherent natural images, and uses the hierarchy of latent variables to capture different aspects of the data distribution. We observe that BIVA, in contrast to recent results, can be used for anomaly detection. We attribute this to the hierarchy of latent variables which is able to extract high-level semantic features. Finally, we extend BIVA to semi-supervised classification tasks and show that it performs comparably to state-of-the-art results by generative adversarial networks.

Paper Structure

This paper contains 43 sections, 17 equations, 13 figures, 13 tables.

Figures (13)

  • Figure 1: A $L=3$ layered BIVA with (a) the generative model and (b) inference model. Blue arrows indicate that the deterministic parameters are shared between the inference and generative models. See Appendix \ref{['app:model_description']} for a detailed explanation and a graphical model that includes the deterministic variables.
  • Figure 2: The $\log KL(q||p)$ for each stochastic latent variable as a function of the training epochs on CIFAR-10. (a) is a $L=N=15$ stochastic latent layer LVAE with no skip-connections and no bottom-up inference. (b) is a $L=N=15$ LVAE+ with skip-connections and no bottom-up inference. (c) is a $L=15$ stochastic latent layer ($N=29$ latent variables) BIVA for which $1,2,...,N$ denotes the stochastic latent variables following the order $z_1^{\hbox{BU}}, z_1^{\hbox{TD}}, z_2^{\hbox{BU}}, z_2^{\hbox{TD}}, ..., z_L$.
  • Figure 3: (left) images from the CelebA dataset preprocessed to 64x64 following Larsen16. (right) $\mathcal{N}(0,I)$ generations of BIVA with $L=20$ layers that achieves a $\mathcal{L}_1=2.48$ bits/dim on the test set.
  • Figure 4: Histograms and kernel density estimation of the $\mathcal{L}^{>k}$ for $k=13,11,0$ evaluated in bits/dim by a model trained on the CIFAR-10 train dataset and evaluated on the CIFAR-10 and the SVHN test set.
  • Figure 5: (a) Generative model of a VAE/LVAE with $L=3$ stochastic variables, (b) VAE inference model, (c) LVAE inference model, and (d) skip connections among stochastic variables in the LVAE where dashed lines denote a skip-connection. Blue arrows indicate that there are shared parameters between the inference and generative model.
  • ...and 8 more figures