Table of Contents
Fetching ...

Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits

Gennaro Gala, Cassio de Campos, Antonio Vergari, Erik Quaeghebeur

TL;DR

This work expands probabilistic integral circuits (PICs) to DAG-shaped hierarchies and shows how to train them at scale by tensorizing their quadrature approximations into quadrature PCs (QPCs). It introduces region graphs (RGs) as a flexible blueprint for constructing PICs, and neural functional sharing to dramatically reduce trainable parameters while enabling large, expressive models. Empirical results demonstrate that QPCs can outperform traditional PCs on density estimation and distribution estimation across MNIST-family and RGB datasets, often with far fewer trainable parameters and comparable compute. The approach provides a principled, differentiable path to scalable, tractable continuous latent-variable models with broad implications for probabilistic modeling and efficient inference.

Abstract

Probabilistic integral circuits (PICs) have been recently introduced as probabilistic models enjoying the key ingredient behind expressive generative models: continuous latent variables (LVs). PICs are symbolic computational graphs defining continuous LV models as hierarchies of functions that are summed and multiplied together, or integrated over some LVs. They are tractable if LVs can be analytically integrated out, otherwise they can be approximated by tractable probabilistic circuits (PC) encoding a hierarchical numerical quadrature process, called QPCs. So far, only tree-shaped PICs have been explored, and training them via numerical quadrature requires memory-intensive processing at scale. In this paper, we address these issues, and present: (i) a pipeline for building DAG-shaped PICs out of arbitrary variable decompositions, (ii) a procedure for training PICs using tensorized circuit architectures, and (iii) neural functional sharing techniques to allow scalable training. In extensive experiments, we showcase the effectiveness of functional sharing and the superiority of QPCs over traditional PCs.

Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits

TL;DR

This work expands probabilistic integral circuits (PICs) to DAG-shaped hierarchies and shows how to train them at scale by tensorizing their quadrature approximations into quadrature PCs (QPCs). It introduces region graphs (RGs) as a flexible blueprint for constructing PICs, and neural functional sharing to dramatically reduce trainable parameters while enabling large, expressive models. Empirical results demonstrate that QPCs can outperform traditional PCs on density estimation and distribution estimation across MNIST-family and RGB datasets, often with far fewer trainable parameters and comparable compute. The approach provides a principled, differentiable path to scalable, tractable continuous latent-variable models with broad implications for probabilistic modeling and efficient inference.

Abstract

Probabilistic integral circuits (PICs) have been recently introduced as probabilistic models enjoying the key ingredient behind expressive generative models: continuous latent variables (LVs). PICs are symbolic computational graphs defining continuous LV models as hierarchies of functions that are summed and multiplied together, or integrated over some LVs. They are tractable if LVs can be analytically integrated out, otherwise they can be approximated by tractable probabilistic circuits (PC) encoding a hierarchical numerical quadrature process, called QPCs. So far, only tree-shaped PICs have been explored, and training them via numerical quadrature requires memory-intensive processing at scale. In this paper, we address these issues, and present: (i) a pipeline for building DAG-shaped PICs out of arbitrary variable decompositions, (ii) a procedure for training PICs using tensorized circuit architectures, and (iii) neural functional sharing techniques to allow scalable training. In extensive experiments, we showcase the effectiveness of functional sharing and the superiority of QPCs over traditional PCs.
Paper Structure (29 sections, 11 equations, 10 figures, 1 table, 6 algorithms)

This paper contains 29 sections, 11 equations, 10 figures, 1 table, 6 algorithms.

Figures (10)

  • Figure 1: PGM (a) $\rightarrow$ tree PIC (b)
  • Figure 2: The pipeline presented in this paper:$\textbf{RG} \rightarrow \textbf{PIC} \rightarrow \textbf{QPC} \rightarrow \textbf{folded QPC}$. Starting from a (fragment of) a DAG-shaped region graph (a), we build a DAG-like PIC via \ref{['alg:rg2pic']} using Tucker-merge (b). Then, we materialize a tensorized QPC encoding a hierarchical quadrature process via \ref{['alg:pic2qpc']}, using $K \, {=} \,2$ quadrature points, which we fold to allow faster inference (d).
  • Figure 3: From functions to sum-product layers via multivariate numerical quadrature (\ref{['sec:pic2qpc']}). We illustrate how the 3-variate function $f(\{Z\}, \{Y_1, Y_2\})$ (a) can be seen as an infinite (quasi) tensor that we first materialize w.r.t. integration points $\tilde{{\mathbf{z}}}$ as a finite tensor $\bm{\mathcal{W}}\xspace$ of size $K \times K \times K$ (b, \ref{['eq:tensor-materialization']}), then flatten as a matrix accounting for integration weights $\widetilde{{\mathbf{w}}}$ (c, \ref{['eq:quad-weights']}), and finally use to parameterize a Tucker layer (d, \ref{['eq:tucker-layer']}).
  • Figure 4: From neural C-sharing to folded CP-layer (\ref{['sec:functional sharing']}). We sketch a 4-headed MLP with Fourier-Features (a) which we use to parameterize a group of 4 integral units (at the same depth) of a PIC (b), whose materialization leads to a folded CP-layer parameterized by a tensor $\bm{\mathcal{W}}\xspace$ of size $2 \times 2 \times K \times K$ (c), with $K$ being the number of integration point. Note that, during materialization, the FF-MLP block in (a) will be only evaluated $K^2$ times, and not $4K^2$.
  • Figure 5: Learning PICs using functional sharing requires (i) comparable resources as PCs and (ii) up to 99% less trainable parameters. We compare the GPU memory (top-left) and time (bottom-left) required to perform an optimization step with PCs (), PICs with functional sharing (), and without (), while considering three different architectures (QT-CP, QG-CP, QG-TK). To the right, we report the number of trainable parameters for (i) PCs () at different $K$, and (ii) for PICs (, ) at different MLP sizes $M$. The isolated nodes refer to refer to PIC (F, N) with QG-TK which we could only run at $K \, {=} \, 16$. The benchmark is conducted using a batch of 128 RGB images of size 64x64 and Adam kingma2014adam. Extra details in \ref{['app:scaling']}.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Definition 1: Region Graph (RG) dennis2012learning
  • Definition 2: Tensorized Circuit loconte2024relationshippeharz2020random
  • Definition 3: Circuit choiprobabilisticvergari2021compositional
  • Definition 4: Probabilistic Circuit
  • Definition 5: Smoothness
  • Definition 6: Decomposability
  • Definition 7: Structured-decomposability pipatsrisawat2008newdarwiche2009modeling