Table of Contents
Fetching ...

On the Expressive Power of Tree-Structured Probabilistic Circuits

Lang Yin, Han Zhao

TL;DR

A negative answer is provided to the conjecture that, for n variables, there exists a quasi-polynomial upper bound on the size of an equivalent tree computing the same probability distribution as a DAG-structured PC.

Abstract

Probabilistic circuits (PCs) have emerged as a powerful framework to compactly represent probability distributions for efficient and exact probabilistic inference. It has been shown that PCs with a general directed acyclic graph (DAG) structure can be understood as a mixture of exponentially (in its height) many components, each of which is a product distribution over univariate marginals. However, existing structure learning algorithms for PCs often generate tree-structured circuits or use tree-structured circuits as intermediate steps to compress them into DAG-structured circuits. This leads to the intriguing question of whether there exists an exponential gap between DAGs and trees for the PC structure. In this paper, we provide a negative answer to this conjecture by proving that, for $n$ variables, there exists a quasi-polynomial upper bound $n^{O(\log n)}$ on the size of an equivalent tree computing the same probability distribution. On the other hand, we also show that given a depth restriction on the tree, there is a super-polynomial separation between tree and DAG-structured PCs. Our work takes an important step towards understanding the expressive power of tree-structured PCs, and our techniques may be of independent interest in the study of structure learning algorithms for PCs.

On the Expressive Power of Tree-Structured Probabilistic Circuits

TL;DR

A negative answer is provided to the conjecture that, for n variables, there exists a quasi-polynomial upper bound on the size of an equivalent tree computing the same probability distribution as a DAG-structured PC.

Abstract

Probabilistic circuits (PCs) have emerged as a powerful framework to compactly represent probability distributions for efficient and exact probabilistic inference. It has been shown that PCs with a general directed acyclic graph (DAG) structure can be understood as a mixture of exponentially (in its height) many components, each of which is a product distribution over univariate marginals. However, existing structure learning algorithms for PCs often generate tree-structured circuits or use tree-structured circuits as intermediate steps to compress them into DAG-structured circuits. This leads to the intriguing question of whether there exists an exponential gap between DAGs and trees for the PC structure. In this paper, we provide a negative answer to this conjecture by proving that, for variables, there exists a quasi-polynomial upper bound on the size of an equivalent tree computing the same probability distribution. On the other hand, we also show that given a depth restriction on the tree, there is a super-polynomial separation between tree and DAG-structured PCs. Our work takes an important step towards understanding the expressive power of tree-structured PCs, and our techniques may be of independent interest in the study of structure learning algorithms for PCs.
Paper Structure (45 sections, 19 theorems, 23 equations, 3 figures, 5 algorithms)

This paper contains 45 sections, 19 theorems, 23 equations, 3 figures, 5 algorithms.

Key Result

Theorem 1.1

Given a network polynomial of $n$ variables, if this polynomial can be computed efficiently by a PC of size $\mathop{\mathrm{poly}}\limits(n)$, then there exists an equivalent tree-structured PC of depth $O(\log n)$ and of size $n^{O(\log n)}$ that computes the same network polynomial.

Figures (3)

  • Figure 1: Partial derivatives of sum nodes.
  • Figure 2: The process of transforming a non-binary DAG-structured PC to a binary one that computes the identical network polynomial. We omit the edge weights for simplicity.
  • Figure 3: The process of converting an arbitrary DAG to a DAG with depth restriction. The red nodes are those in $\mathbf{G}_2$ and their relationships imply the computational procedure.

Theorems & Definitions (31)

  • Theorem 1.1: Informal
  • Theorem 1.2: Informal
  • Definition 2.1: Decomposability and Smoothness
  • Definition 2.2: Partial Derivative
  • Definition 2.3: Parse Tree
  • Definition 2.4: Monotonicity
  • Theorem 3.1
  • Lemma 3.2
  • Lemma 3.2
  • Lemma 3.2
  • ...and 21 more