Table of Contents
Fetching ...

Sum of Squares Circuits

Lorenzo Loconte, Stefan Mengel, Antonio Vergari

TL;DR

This work investigates the expressiveness of probabilistic circuits (PCs) under tractable inference and introduces Sum of Compatible Squares (SOCS) as a powerful non-monotonic extension. It establishes an expressiveness hierarchy showing that monotonic PCs can outperform squared PCs, while SOCS can exponentially surpass both, unifying models such as PSD, Born Machines, and Inception PCs under a common SOCS framework. The authors provide two constructive separations (UPS and UTQ) and show that complex parameters further enhance expressiveness within SOCS, with empirical validation on distribution estimation across tabular and image data. The results suggest SOCS as a scalable, expressive, and practical approach for tractable probabilistic modeling, with potential connections to SOS polynomials and broader non-negative representations.

Abstract

Designing expressive generative models that support exact and efficient inference is a core question in probabilistic ML. Probabilistic circuits (PCs) offer a framework where this tractability-vs-expressiveness trade-off can be analyzed theoretically. Recently, squared PCs encoding subtractive mixtures via negative parameters have emerged as tractable models that can be exponentially more expressive than monotonic PCs, i.e., PCs with positive parameters only. In this paper, we provide a more precise theoretical characterization of the expressiveness relationships among these models. First, we prove that squared PCs can be less expressive than monotonic ones. Second, we formalize a novel class of PCs -- sum of squares PCs -- that can be exponentially more expressive than both squared and monotonic PCs. Around sum of squares PCs, we build an expressiveness hierarchy that allows us to precisely unify and separate different tractable model classes such as Born Machines and PSD models, and other recently introduced tractable probabilistic models by using complex parameters. Finally, we empirically show the effectiveness of sum of squares circuits in performing distribution estimation.

Sum of Squares Circuits

TL;DR

This work investigates the expressiveness of probabilistic circuits (PCs) under tractable inference and introduces Sum of Compatible Squares (SOCS) as a powerful non-monotonic extension. It establishes an expressiveness hierarchy showing that monotonic PCs can outperform squared PCs, while SOCS can exponentially surpass both, unifying models such as PSD, Born Machines, and Inception PCs under a common SOCS framework. The authors provide two constructive separations (UPS and UTQ) and show that complex parameters further enhance expressiveness within SOCS, with empirical validation on distribution estimation across tabular and image data. The results suggest SOCS as a scalable, expressive, and practical approach for tractable probabilistic modeling, with potential connections to SOS polynomials and broader non-negative representations.

Abstract

Designing expressive generative models that support exact and efficient inference is a core question in probabilistic ML. Probabilistic circuits (PCs) offer a framework where this tractability-vs-expressiveness trade-off can be analyzed theoretically. Recently, squared PCs encoding subtractive mixtures via negative parameters have emerged as tractable models that can be exponentially more expressive than monotonic PCs, i.e., PCs with positive parameters only. In this paper, we provide a more precise theoretical characterization of the expressiveness relationships among these models. First, we prove that squared PCs can be less expressive than monotonic ones. Second, we formalize a novel class of PCs -- sum of squares PCs -- that can be exponentially more expressive than both squared and monotonic PCs. Around sum of squares PCs, we build an expressiveness hierarchy that allows us to precisely unify and separate different tractable model classes such as Born Machines and PSD models, and other recently introduced tractable probabilistic models by using complex parameters. Finally, we empirically show the effectiveness of sum of squares circuits in performing distribution estimation.
Paper Structure (68 sections, 26 theorems, 58 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 68 sections, 26 theorems, 58 equations, 11 figures, 5 tables, 1 algorithm.

Key Result

Theorem 0

There is a class of non-negative functions $\mathcal{F}$ over $d$ variables $\bm{\mathrm{X}}$ that can be represented as a PC $c^2\in\pm^2_{\mathbb{R}}\xspace$ with size $|c^2|\in\mathcal{O}(d^2)$. However, the smallest monotonic and structured PC computing any $F\in\mathcal{F}$ has at least size $2

Figures (11)

  • Figure 1: For each circuit class $\mathcal{C}$ (see \ref{['tab:circuit-classes-summary']} for a description) we illustrate with a rectangle the set of functions that can be efficiently computed by a circuit in $\mathcal{C}$. The $\mathsf{SUM}$, $\mathsf{UPS}$, and $\mathsf{UTQ}$ functions are introduced in this paper to show exponential separation results between classes $+_{\mathsf{sd}}$, $\pm^2_{\mathbb{R}}$ and $\Sigma_{\mathsf{cmp}}^2$. We show the overlapping of classes in terms of expressiveness (denoted in the figure with $=$), and report some open questions about this hierarchy in \ref{['sec:socs-limitation-related-work']}.
  • Figure 2: Monotonic PCs ($+_{\mathsf{sd}}$) can perform better than a single real squared PC ($\pm^2_{\mathbb{R}}$) on average, but worse than a single complex squared PC ($\pm^2_{\mathbb{C}}$), and worse than SOCS PCs ($\Sigma_{\mathsf{cmp},\mathbb{R}}^2$ and $\Sigma_{\mathsf{cmp},\mathbb{C}}^2$) with an increasing number of squares. For $+_{\mathsf{sd}}$ we take mixtures of monotonic PC as components. We show box-plots of average test log-likelihoods on multiple runs. All PCs have approximately the same number of parameters (see main text). Details in \ref{['app:experiments-configuration']}.
  • Figure 3: Complex squared PCs are more accurate estimators on image data. We show test BPD (lower is better) w.r.t. the number of learnable parameters of structured monotonic PCs ($+_{\mathsf{sd}}$), squared PCs with real ($\pm^2_{\mathbb{R}}$) and complex ($\pm^2_{\mathbb{C}}$) parameters (which are counted twice), and the product of a monotonic PC by a complex squared PC ($+_{\mathsf{sd}}\xspace\cdot\pm^2_{\mathbb{C}}\xspace$, see \ref{['defn:musocs']}). We report the area between min and max BPDs obtained from 5 independent runs with different seeds.
  • Figure A.1: A structured-decomposable circuit $c$ over Boolean variables $\bm{\mathrm{X}}=\{X_1,\ldots,X_4\}$. An input unit over variable $X_i$, illustrated as , computes one of the indicator functions $\bm{1}\!\left\{X_i\right\} = \bm{1}\!\left\{X_i = 1\right\}$ and $\bm{1}\!\left\{\neg X_i\right\} = \bm{1}\!\left\{X_i = 0\right\}$. Sum units () parameters are not shown for simplicity, and we highlight product units () having the same scope with the same color. The scope decompositions induced by the product units are $\bm{\mathrm{X}}\rightarrow(\{X_1,X_2,X_3\},\{X_4\})$ (blue), $\{X_1,X_2,X_3\}\rightarrow(\{X_1,X_2\},\{X_3\})$ (green), and $\{X_1,X_2\}\rightarrow(\{X_1\},\{X_2\})$ (red). A feed-forward circuit evaluation is performed by evaluating the inputs on some assignment to $\bm{\mathrm{X}}$, and then evaluating the circuit from the inputs towards the output unit computing $c(\bm{\mathrm{X}})$.
  • Figure A.2: Iterative decomposition of a smooth and decomposable circuit. Given the smooth and decomposable circuit $c$ shown in \ref{['fig:structured-circuit']}, we choose a unit $n_0$ unit such that its scope is balanced (in red, having scope $\{X_1,X_2\}$) by traversing the computational graph from the output unit towards the inputs (left). Let $z_0$ be the multi-linear polynomial that $n_0$ would compute, which we label in red in circuit $c_0'$. Then, the unit $n_0$ is removed by effectively setting $z_0 = 0$ and by pruning the circuit accordingly. By doing so, we retrieve the circuit $c_1'$ (right). Note that $c_1'$ inherits the structural properties from $c_0'$, thus from $c$. By repeating the same process, we choose and prune the unit $n_1$ in $c_1'$, resulting in the "empty" circuit $c_2$ (not shown).
  • ...and 6 more figures

Theorems & Definitions (63)

  • Definition 1: Circuit vergari2021compositional
  • Definition 2: Smoothness and decomposability darwiche2002knowledge
  • Definition 3: Compatibility vergari2021compositional
  • Theorem 0: loconte2024subtractive
  • Theorem 1
  • Definition 4
  • Theorem 2
  • Definition 5:
  • Corollary 1
  • Theorem 3
  • ...and 53 more