Table of Contents
Fetching ...

Continuous Mixtures of Tractable Probabilistic Models

Alvaro H. C. Correia, Gennaro Gala, Erik Quaeghebeur, Cassio de Campos, Robert Peharz

TL;DR

This paper investigates a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension, which shows that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable model on many standard density estimation benchmarks.

Abstract

Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven to be expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, and thus are capable of performing exact inference efficiently but often show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, for a finite set of integration points, the integration method effectively compiles the continuous mixture into a standard PC. In experiments, we show that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable models on many standard density estimation benchmarks.

Continuous Mixtures of Tractable Probabilistic Models

TL;DR

This paper investigates a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension, which shows that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable model on many standard density estimation benchmarks.

Abstract

Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven to be expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, and thus are capable of performing exact inference efficiently but often show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, for a finite set of integration points, the integration method effectively compiles the continuous mixture into a standard PC. In experiments, we show that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable models on many standard density estimation benchmarks.
Paper Structure (18 sections, 15 equations, 9 figures, 12 tables)

This paper contains 18 sections, 15 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Relative performance gap to the best log-likelihood in Table \ref{['tab:20datasets']} as a function of the number of integration points at test time and averaged over all 20 datasets. Latent Optimisation is run (on purpose) for fewer number of integration points yet performs best. Lower is better.
  • Figure 2: Samples from 'Small Einet' (left column), 'Big Einet' (middle column) and $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}}}}\nolimits)}}\nolimits$ (right column).
  • Figure 3: Test log-likelihood on the Binary MNIST dataset against latent dimensionality for $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}}}}\nolimits)}}\nolimits$ (left) and $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}})_{\mathsf{VAE}}}}\nolimits$ (right) models evaluated with different numbers of integration points.
  • Figure 4: Image samples from models with normal distributions at the leaves: 'Small Einet' (left column), 'Big Einet' (middle column) and $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}}}}\nolimits)}}\nolimits$ (right column). Once more we see $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}}}}\nolimits)}}\nolimits$ offers better sample quality than Einets.
  • Figure 5: Image samples from models with normal distributions at the leaves but ignoring the variance of individual pixels: 'Small Einet' (left column), 'Big Einet' (middle column) and $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}}}}\nolimits)}}\nolimits$ (right column).
  • ...and 4 more figures

Theorems & Definitions (1)

  • Definition 1: Probabilistic Circuit