Table of Contents
Fetching ...

On the Hardness of Approximating Distributions with Tractable Probabilistic Models

John Leland, YooJung Choi

TL;DR

This work investigates whether small approximation errors can mitigate the expressive/inference tradeoffs in tractable probabilistic models, focusing on probabilistic circuits (PCs). It proves that approximating arbitrary distributions within a bounded $f$-divergence using any model with tractable marginals is $\,\textsf{NP}$-hard, and it establishes an unconditional exponential size gap between decomposable PCs and those that are both decomposable and deterministic under approximation. The paper also analyzes how divergence bounds relate to marginal and MAP inference guarantees, showing that TV and $f$-divergence bounds can imply useful absolute approximations for these queries but that such guarantees do not universally extend to all query types. Overall, the results delineate fundamental limits on learning compact PCs for approximate distribution modeling and guide future work on when and how approximate compilation can support reliable approximate inference.

Abstract

A fundamental challenge in probabilistic modeling is to balance expressivity and inference efficiency. Tractable probabilistic models (TPMs) aim to directly address this tradeoff by imposing constraints that guarantee efficient inference of certain queries while maintaining expressivity. In particular, probabilistic circuits (PCs) provide a unifying framework for many TPMs, by characterizing families of models as circuits satisfying different structural properties. Because the complexity of inference on PCs is a function of the circuit size, understanding the size requirements of different families of PCs is fundamental in mapping the trade-off between tractability and expressive efficiency. However, the study of expressive efficiency of circuits are often concerned with exact representations, which may not align with model learning, where we look to approximate the underlying data distribution closely by some distance measure. Moreover, due to hardness of inference tasks, exactly representing distributions while supporting tractable inference often incurs exponential size blow-ups. In this paper, we consider a natural, yet so far underexplored, question: can we avoid such size blow-up by allowing for some small approximation error? We study approximating distributions with probabilistic circuits with guarantees based on $f$-divergences, and analyze which inference queries remain well-approximated under this framework. We show that approximating an arbitrary distribution with bounded $f$-divergence is $\mathsf{NP}$-hard for any model that can tractably compute marginals. In addition, we prove an exponential size gap for approximation between the class of decomposable PCs and that of decomposable and deterministic PCs.

On the Hardness of Approximating Distributions with Tractable Probabilistic Models

TL;DR

This work investigates whether small approximation errors can mitigate the expressive/inference tradeoffs in tractable probabilistic models, focusing on probabilistic circuits (PCs). It proves that approximating arbitrary distributions within a bounded -divergence using any model with tractable marginals is -hard, and it establishes an unconditional exponential size gap between decomposable PCs and those that are both decomposable and deterministic under approximation. The paper also analyzes how divergence bounds relate to marginal and MAP inference guarantees, showing that TV and -divergence bounds can imply useful absolute approximations for these queries but that such guarantees do not universally extend to all query types. Overall, the results delineate fundamental limits on learning compact PCs for approximate distribution modeling and guide future work on when and how approximate compilation can support reliable approximate inference.

Abstract

A fundamental challenge in probabilistic modeling is to balance expressivity and inference efficiency. Tractable probabilistic models (TPMs) aim to directly address this tradeoff by imposing constraints that guarantee efficient inference of certain queries while maintaining expressivity. In particular, probabilistic circuits (PCs) provide a unifying framework for many TPMs, by characterizing families of models as circuits satisfying different structural properties. Because the complexity of inference on PCs is a function of the circuit size, understanding the size requirements of different families of PCs is fundamental in mapping the trade-off between tractability and expressive efficiency. However, the study of expressive efficiency of circuits are often concerned with exact representations, which may not align with model learning, where we look to approximate the underlying data distribution closely by some distance measure. Moreover, due to hardness of inference tasks, exactly representing distributions while supporting tractable inference often incurs exponential size blow-ups. In this paper, we consider a natural, yet so far underexplored, question: can we avoid such size blow-up by allowing for some small approximation error? We study approximating distributions with probabilistic circuits with guarantees based on -divergences, and analyze which inference queries remain well-approximated under this framework. We show that approximating an arbitrary distribution with bounded -divergence is -hard for any model that can tractably compute marginals. In addition, we prove an exponential size gap for approximation between the class of decomposable PCs and that of decomposable and deterministic PCs.

Paper Structure

This paper contains 19 sections, 12 theorems, 13 equations, 1 figure, 1 table.

Key Result

Theorem 3.1

Let $\epsilon > 0$ and $P,Q$ be two probability distributions over $\mathbf{X}\xspace$. If $Q$ is a relative approximator of marginals for $P$, then $D_{\mathsf{TV}}(P \Vert Q) \leq \frac{\epsilon}{2}$.

Figures (1)

  • Figure 1: A smooth, decomposable, and deterministic PC (weights shown only for the root for conciseness).

Theorems & Definitions (35)

  • Definition 2.1: Probabilistic circuits
  • Definition 2.2: Smoothness and decomposability
  • Definition 2.3: Determinism
  • Definition 2.4: $f$-divergence polyanskiy_fdivergence
  • Definition 2.5: Total variation distance
  • Definition 2.6: $\epsilon$-$D$-Approximation
  • Definition 2.7: $k$-convex $f$-divergence melbourne2020strongly
  • Theorem 3.1: Relative approximation implies bounded $D_{\mathsf{TV}}(P \Vert Q)$
  • proof
  • Proposition 3.2: Bounded $D_{\mathsf{TV}}(P \Vert Q)$ does not imply relative approximation
  • ...and 25 more