Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits
Robert Peharz, Steven Lang, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Guy Van den Broeck, Kristian Kersting, Zoubin Ghahramani
TL;DR
This paper tackles the bottleneck of scalable, tractable probabilistic circuits by introducing Einsum Networks (EiNets), which compute all PC operations through a single monolithic einsum and a numerically stable log-einsum-exp trick. By vectorizing PCs, organizing computations into topological layers, and using exponential-family leaves with EM learned via automatic differentiation, EiNets achieve large speedups and memory savings while maintaining exact inference for queries like $p(oldsymbol{X}_q oldsymbol{|} oldsymbol{X}_e)$. The authors also introduce stochastic EM and connect it to natural gradient concepts, enabling training on large-scale data, demonstrated by successful generative modelling on SVHN and CelebA with tractable inference for tasks such as inpainting. These advancements significantly broaden the applicability of tractable probabilistic circuits to real-world, high-dimensional datasets, providing a practical alternative to more opaque generative models. $p(oldsymbol{X}_q oldsymbol{|} oldsymbol{X}_e)$ can now be computed efficiently within EiNets, highlighting both the theoretical and practical impact of this approach.
Abstract
Probabilistic circuits (PCs) are a promising avenue for probabilistic modeling, as they permit a wide range of exact and efficient inference routines. Recent ``deep-learning-style'' implementations of PCs strive for a better scalability, but are still difficult to train on real-world data, due to their sparsely connected computational graphs. In this paper, we propose Einsum Networks (EiNets), a novel implementation design for PCs, improving prior art in several regards. At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation, leading to speedups and memory savings of up to two orders of magnitude, in comparison to previous implementations. As an algorithmic contribution, we show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation. Furthermore, we demonstrate that EiNets scale well to datasets which were previously out of reach, such as SVHN and CelebA, and that they can be used as faithful generative image models.
