Continuous Mixtures of Tractable Probabilistic Models

Alvaro H. C. Correia; Gennaro Gala; Erik Quaeghebeur; Cassio de Campos; Robert Peharz

Continuous Mixtures of Tractable Probabilistic Models

Alvaro H. C. Correia, Gennaro Gala, Erik Quaeghebeur, Cassio de Campos, Robert Peharz

TL;DR

This paper investigates a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension, which shows that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable model on many standard density estimation benchmarks.

Abstract

Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven to be expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, and thus are capable of performing exact inference efficiently but often show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, for a finite set of integration points, the integration method effectively compiles the continuous mixture into a standard PC. In experiments, we show that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable models on many standard density estimation benchmarks.

Continuous Mixtures of Tractable Probabilistic Models

TL;DR

Abstract

Paper Structure (18 sections, 15 equations, 9 figures, 12 tables)

This paper contains 18 sections, 15 equations, 9 figures, 12 tables.

Quadrature Rules
Sparse Grids.
Monte Carlo (MC)
Model Description
Latent Space
Decoder architecture
Structure in Tractable Probabilistic Models
Training
Effect of Latent Space Dimensionality
Training via Amortised Variational Inference
Plain Mixture Models
Additional Experimental Results
Binary Density Estimation Benchmarks
Non-binary Image Data
Demonstration of Tractable Queries
...and 3 more sections

Figures (9)

Figure 1: Relative performance gap to the best log-likelihood in Table \ref{['tab:20datasets']} as a function of the number of integration points at test time and averaged over all 20 datasets. Latent Optimisation is run (on purpose) for fewer number of integration points yet performs best. Lower is better.
Figure 2: Samples from 'Small Einet' (left column), 'Big Einet' (middle column) and $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}}}}\nolimits)}}\nolimits$ (right column).
Figure 3: Test log-likelihood on the Binary MNIST dataset against latent dimensionality for $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}}}}\nolimits)}}\nolimits$ (left) and $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}})_{\mathsf{VAE}}}}\nolimits$ (right) models evaluated with different numbers of integration points.
Figure 4: Image samples from models with normal distributions at the leaves: 'Small Einet' (left column), 'Big Einet' (middle column) and $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}}}}\nolimits)}}\nolimits$ (right column). Once more we see $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}}}}\nolimits)}}\nolimits$ offers better sample quality than Einets.
Figure 5: Image samples from models with normal distributions at the leaves but ignoring the variance of individual pixels: 'Small Einet' (left column), 'Big Einet' (middle column) and $\mathop{\mathrm{\mathop{\mathrm{cm}}\limits(\mathop{\mathrm{\mathop{\mathrm{\mathcal{S}}}\nolimits_{\mathsf{F}}}}\nolimits)}}\nolimits$ (right column).
...and 4 more figures

Theorems & Definitions (1)

Definition 1: Probabilistic Circuit

Continuous Mixtures of Tractable Probabilistic Models

TL;DR

Abstract

Continuous Mixtures of Tractable Probabilistic Models

Authors

TL;DR

Abstract

Table of Contents

Figures (9)

Theorems & Definitions (1)