Table of Contents
Fetching ...

On Faster Marginalization with Squared Circuits via Orthonormalization

Lorenzo Loconte, Antonio Vergari

TL;DR

The paper addresses the high cost of marginalization in squared circuits by introducing an orthonormal parameterization inspired by tensor-network canonical forms, which ensures the squared circuit is already normalized (i.e., $Z=1$). It develops sufficient conditions using semi-unitary, orthonormal input functions and a QR-based orthonormalization procedure, enabling a Marginalize algorithm with improved complexity $O(|\phi_{\mathbf{Y}}| S + |\phi_{\mathbf{Y},\mathbf{Z}}| S^2)$ for computing marginals. The key contributions include (i) a principled parameterization that preserves expressiveness for many circuit classes, and (ii) a faster, structurally aware marginalization method, along with a procedure to convert non-orthonormal circuits into orthonormal ones without loss of distributional power. These advances potentially broaden the applicability of squared circuits to tasks requiring fast marginalization, such as lossless compression and probabilistic reasoning in deep learning systems, by enabling efficient exact inference with normalized distributions.

Abstract

Squared tensor networks (TNs) and their generalization as parameterized computational graphs -- squared circuits -- have been recently used as expressive distribution estimators in high dimensions. However, the squaring operation introduces additional complexity when marginalizing variables or computing the partition function, which hinders their usage in machine learning applications. Canonical forms of popular TNs are parameterized via unitary matrices as to simplify the computation of particular marginals, but cannot be mapped to general circuits since these might not correspond to a known TN. Inspired by TN canonical forms, we show how to parameterize squared circuits to ensure they encode already normalized distributions. We then use this parameterization to devise an algorithm to compute any marginal of squared circuits that is more efficient than a previously known one. We conclude by formally showing the proposed parameterization comes with no expressiveness loss for many circuit classes.

On Faster Marginalization with Squared Circuits via Orthonormalization

TL;DR

The paper addresses the high cost of marginalization in squared circuits by introducing an orthonormal parameterization inspired by tensor-network canonical forms, which ensures the squared circuit is already normalized (i.e., ). It develops sufficient conditions using semi-unitary, orthonormal input functions and a QR-based orthonormalization procedure, enabling a Marginalize algorithm with improved complexity for computing marginals. The key contributions include (i) a principled parameterization that preserves expressiveness for many circuit classes, and (ii) a faster, structurally aware marginalization method, along with a procedure to convert non-orthonormal circuits into orthonormal ones without loss of distributional power. These advances potentially broaden the applicability of squared circuits to tasks requiring fast marginalization, such as lossless compression and probabilistic reasoning in deep learning systems, by enabling efficient exact inference with normalized distributions.

Abstract

Squared tensor networks (TNs) and their generalization as parameterized computational graphs -- squared circuits -- have been recently used as expressive distribution estimators in high dimensions. However, the squaring operation introduces additional complexity when marginalizing variables or computing the partition function, which hinders their usage in machine learning applications. Canonical forms of popular TNs are parameterized via unitary matrices as to simplify the computation of particular marginals, but cannot be mapped to general circuits since these might not correspond to a known TN. Inspired by TN canonical forms, we show how to parameterize squared circuits to ensure they encode already normalized distributions. We then use this parameterization to devise an algorithm to compute any marginal of squared circuits that is more efficient than a previously known one. We conclude by formally showing the proposed parameterization comes with no expressiveness loss for many circuit classes.

Paper Structure

This paper contains 25 sections, 3 theorems, 14 equations, 2 figures, 3 algorithms.

Key Result

Proposition 1

Let $c$ be a structured-decomposable tensorized circuit over variables $\bm{\mathrm{X}}$. If $c$ is orthonormal, then its squaring encodes a normalized distribution, i.e., $Z=1$.

Figures (2)

  • Figure 1: Squared orthonormal PCs enable a more efficient marginalization algorithm. The left figure shows a tensorized circuit $c$ with a tree structure over $\bm{\mathrm{X}} = \{X_1,X_2,X_3,X_4\}$ with input , Hadamard and sum layers. We label the input layers with the vector-valued function they encode on a variable $X_i$. Consider computing the marginal likelihood $p(x_1,x_2) = \int_{\mathsf{dom}(X_3)\times \mathsf{dom}(X_4)} |c(x_1,x_2,x_3,x_4)|^2 \mathrm{d}x_3\mathrm{d}x_4$. We label group of layers depending on $\bm{\mathrm{Y}}=\{X_1,X_2\}$ (red-ish, $\phi_{\bm{\mathrm{Y}}}$), $\bm{\mathrm{Z}}=\{X_3,X_4\}$ (blue-ish, $\phi_{\bm{\mathrm{Z}}}$), and on both (green, $\phi_{\bm{\mathrm{Y}},\bm{\mathrm{Z}}}$). A naive algorithm computing $p(x_1,x_2)$ would (i) square the whole tensorized circuit as $c^2$, where the size of each layer quadratically increases, and (ii) compute the integrals of squared input layers over $\bm{\mathrm{Z}}$ and (iii) evaluate the rest of the squared layers (middle, from left to right). (right) Instead, if $c$ is orthonormal, \ref{['alg:marginalization']} avoids the computation of the integral of the sub-circuit depending on $\bm{\mathrm{Z}}$ (as it results in the identity matrix $\bm{\mathrm{I}}_K$, in blue), and requires computing a single Kronecker product (orange) and squaring just the layers in $\phi_{\bm{\mathrm{Y}},\bm{\mathrm{Z}}}$ (green).
  • Figure B.1: \ref{['alg:orthonormalization']} recursively make the sum layer parameter matrices of (semi-)unitary. Given a fragment of a tensorized circuit (left), our algorithm computes QR decompositions of the sum layer parameter matrices $\bm{\mathrm{V}}_1^\dagger$ and $\bm{\mathrm{V}}_2^\dagger$, thus yielding $\bm{\mathrm{V}}_1 = \bm{\mathrm{R}}_1^\dagger \bm{\mathrm{Q}}_1^\dagger$ and $\bm{\mathrm{V}}_2 = \bm{\mathrm{R}}_2^\dagger \bm{\mathrm{Q}}_2^\dagger$ (mid) (L9-13 in the algorithm). The matrices $\bm{\mathrm{R}}_1^\dagger,\bm{\mathrm{R}}_2^\dagger$ are propagated towards the subsequent Hadamard layer in \ref{['alg:orthonormalization']}, where $\bm{\mathrm{R}}_1^\dagger \bullet \bm{\mathrm{R}}_2^\dagger$ is computed (L21) and then multiplied to the parameter matrix $\bm{\mathrm{W}}$ (right) (L8). Note that the Hadamard product layer is replaced with a Kronecker product layer, accounting for a polynomial increase in the layer size. The same procedure is then recursively applied to the parameter matrix $\bm{\mathrm{W}}(\bm{\mathrm{R}}_1^\dagger \bullet \bm{\mathrm{R}}_2^\dagger)$ (not shown).

Theorems & Definitions (10)

  • Definition 1: Tensorized circuit
  • Definition 2: Layer-wise smoothness and decomposability darwiche2002knowledgeloconte2024relationship
  • Definition 3: Orthonormal circuits
  • Proposition 1
  • Theorem 1
  • Theorem 2
  • proof
  • proof
  • proof
  • Definition B.1: Face-splitting product