Table of Contents
Fetching ...

Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits

Robert Peharz, Steven Lang, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Guy Van den Broeck, Kristian Kersting, Zoubin Ghahramani

TL;DR

This paper tackles the bottleneck of scalable, tractable probabilistic circuits by introducing Einsum Networks (EiNets), which compute all PC operations through a single monolithic einsum and a numerically stable log-einsum-exp trick. By vectorizing PCs, organizing computations into topological layers, and using exponential-family leaves with EM learned via automatic differentiation, EiNets achieve large speedups and memory savings while maintaining exact inference for queries like $p(oldsymbol{X}_q oldsymbol{|} oldsymbol{X}_e)$. The authors also introduce stochastic EM and connect it to natural gradient concepts, enabling training on large-scale data, demonstrated by successful generative modelling on SVHN and CelebA with tractable inference for tasks such as inpainting. These advancements significantly broaden the applicability of tractable probabilistic circuits to real-world, high-dimensional datasets, providing a practical alternative to more opaque generative models. $p(oldsymbol{X}_q oldsymbol{|} oldsymbol{X}_e)$ can now be computed efficiently within EiNets, highlighting both the theoretical and practical impact of this approach.

Abstract

Probabilistic circuits (PCs) are a promising avenue for probabilistic modeling, as they permit a wide range of exact and efficient inference routines. Recent ``deep-learning-style'' implementations of PCs strive for a better scalability, but are still difficult to train on real-world data, due to their sparsely connected computational graphs. In this paper, we propose Einsum Networks (EiNets), a novel implementation design for PCs, improving prior art in several regards. At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation, leading to speedups and memory savings of up to two orders of magnitude, in comparison to previous implementations. As an algorithmic contribution, we show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation. Furthermore, we demonstrate that EiNets scale well to datasets which were previously out of reach, such as SVHN and CelebA, and that they can be used as faithful generative image models.

Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits

TL;DR

This paper tackles the bottleneck of scalable, tractable probabilistic circuits by introducing Einsum Networks (EiNets), which compute all PC operations through a single monolithic einsum and a numerically stable log-einsum-exp trick. By vectorizing PCs, organizing computations into topological layers, and using exponential-family leaves with EM learned via automatic differentiation, EiNets achieve large speedups and memory savings while maintaining exact inference for queries like . The authors also introduce stochastic EM and connect it to natural gradient concepts, enabling training on large-scale data, demonstrated by successful generative modelling on SVHN and CelebA with tractable inference for tasks such as inpainting. These advancements significantly broaden the applicability of tractable probabilistic circuits to real-world, high-dimensional datasets, providing a practical alternative to more opaque generative models. can now be computed efficiently within EiNets, highlighting both the theoretical and practical impact of this approach.

Abstract

Probabilistic circuits (PCs) are a promising avenue for probabilistic modeling, as they permit a wide range of exact and efficient inference routines. Recent ``deep-learning-style'' implementations of PCs strive for a better scalability, but are still difficult to train on real-world data, due to their sparsely connected computational graphs. In this paper, we propose Einsum Networks (EiNets), a novel implementation design for PCs, improving prior art in several regards. At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation, leading to speedups and memory savings of up to two orders of magnitude, in comparison to previous implementations. As an algorithmic contribution, we show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation. Furthermore, we demonstrate that EiNets scale well to datasets which were previously out of reach, such as SVHN and CelebA, and that they can be used as faithful generative image models.

Paper Structure

This paper contains 17 sections, 9 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Basic einsum operation in EiNets: A sum node $\mathsf{S}$, with a single child $\mathsf{P}$, which itself has 2 children. All nodes are vectorized, as described in Section \ref{['sec:vectorizing']}, and here illustrated for $K=5$.
  • Figure 2: Example of an einsum layer, parallelizing the basic einsum operation.
  • Figure 3: Illustration of training time and peak memory consumption of EiNets, SPFlow and LibSPN when training randomized binary PC trees, and varying hyper-parameters $K$ (number of densities per sum/leaf), depth $D$, and number of replica $R$, respectively. The blob size directly corresponds to the respective hyper-parameter under change. The total number of parameters ranged within $10k-9.4M$ (for varying $K$), $100k-5.2M$ (for varying $D$), and $24k-973k$ (for varying $R$). For LibSPN, some settings exhausted GPU memory and are therefore missing.
  • Figure 4: Qualitative results of EiNets trained on RGB data, namely SVHN (top, image dimensions $32 \times 32$) and Celeba (bottom, image dimensions $128 \times 128$). In all samples, the means of the Gaussian leaves were used---see Section \ref{['sec:addendum']} for more information.
  • Figure 5: Decomposing a layer of sum nodes with multiple children (left) into two consecutive sum layers (right). The first sum layer computes a standard einsum layer, discussed in Section 3.3 in the main paper. The second layer, the so-called mixing layer, takes element-wise mixtures (depicted with sums in boxes).
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition 1: Probabilistic Circuit