Table of Contents
Fetching ...

How to Square Tensor Networks and Circuits Without Squaring Them

Lorenzo Loconte, Adrián Javaloy, Antonio Vergari

TL;DR

This work addresses the marginalization and normalization bottlenecks of squared tensor networks and squared probabilistic circuits by introducing orthogonality- and unitarity-based constraints. The authors develop a unitary, tensorized-circuit framework that generalizes canonical TN forms to circuits, enabling already-normalized squared distributions and linear-time marginalization even for non-structured decomposable factorizations. They provide a Marginalization algorithm with tight complexity bounds and demonstrate through experiments on image datasets and synthetic data that the proposed methods deliver efficiency gains without sacrificing expressiveness. The results suggest a broader potential for flexible, scalable Born-machine-like models and invite future work on new factorization structures that maintain tractable inference.

Abstract

Squared tensor networks (TNs) and their extension as computational graphs--squared circuits--have been used as expressive distribution estimators, yet supporting closed-form marginalization. However, the squaring operation introduces additional complexity when computing the partition function or marginalizing variables, which hinders their applicability in ML. To solve this issue, canonical forms of TNs are parameterized via unitary matrices to simplify the computation of marginals. However, these canonical forms do not apply to circuits, as they can represent factorizations that do not directly map to a known TN. Inspired by the ideas of orthogonality in canonical forms and determinism in circuits enabling tractable maximization, we show how to parameterize squared circuits to overcome their marginalization overhead. Our parameterizations unlock efficient marginalization even in factorizations different from TNs, but encoded as circuits, whose structure would otherwise make marginalization computationally hard. Finally, our experiments on distribution estimation show how our proposed conditions in squared circuits come with no expressiveness loss, while enabling more efficient learning.

How to Square Tensor Networks and Circuits Without Squaring Them

TL;DR

This work addresses the marginalization and normalization bottlenecks of squared tensor networks and squared probabilistic circuits by introducing orthogonality- and unitarity-based constraints. The authors develop a unitary, tensorized-circuit framework that generalizes canonical TN forms to circuits, enabling already-normalized squared distributions and linear-time marginalization even for non-structured decomposable factorizations. They provide a Marginalization algorithm with tight complexity bounds and demonstrate through experiments on image datasets and synthetic data that the proposed methods deliver efficiency gains without sacrificing expressiveness. The results suggest a broader potential for flexible, scalable Born-machine-like models and invite future work on new factorization structures that maintain tractable inference.

Abstract

Squared tensor networks (TNs) and their extension as computational graphs--squared circuits--have been used as expressive distribution estimators, yet supporting closed-form marginalization. However, the squaring operation introduces additional complexity when computing the partition function or marginalizing variables, which hinders their applicability in ML. To solve this issue, canonical forms of TNs are parameterized via unitary matrices to simplify the computation of marginals. However, these canonical forms do not apply to circuits, as they can represent factorizations that do not directly map to a known TN. Inspired by the ideas of orthogonality in canonical forms and determinism in circuits enabling tractable maximization, we show how to parameterize squared circuits to overcome their marginalization overhead. Our parameterizations unlock efficient marginalization even in factorizations different from TNs, but encoded as circuits, whose structure would otherwise make marginalization computationally hard. Finally, our experiments on distribution estimation show how our proposed conditions in squared circuits come with no expressiveness loss, while enabling more efficient learning.

Paper Structure

This paper contains 27 sections, 16 theorems, 41 equations, 8 figures, 3 tables, 5 algorithms.

Key Result

Theorem 1

Let $c$ be a smooth, decomposable and orthogonal circuit over $\bm{\mathrm{X}}$. Then computing the partition function $Z = \int_{\mathsf{dom}(\bm{\mathrm{X}})} |c(\bm{\mathrm{x}})|^2 \,\mathrm{d}\bm{\mathrm{x}}$ can be done in time $\mathcal{O}(|c|)$.

Figures (8)

  • Figure 1: Matrix-product states (MPSs) are circuits. A MPS TN of rank $R=2$, here in Penrose graphical notation (bottom right), models a function $\psi$ over $\bm{\mathrm{X}}=\{X_1,X_2,X_3\}$ as $\psi(\bm{\mathrm{X}}) = \sum_{i_1=1}^R \sum_{i_2=1}^R \psi_1^{i_1}(X_1) \psi_2^{i_1,i_2}(X_2) \psi_3^{i_2}(X_3)$. Given an assignment $\bm{\mathrm{x}} = \langle x_1,x_2,x_3\rangle$, the circuit computes the complete contraction of the MPS, i.e., $\psi(\bm{\mathrm{x}})$ (above left). The circuit input units () compute the factors $\psi_1^{i_1}$, $\psi_2^{i_1,i_2}$, $\psi_3^{i_2}$ over $X_1$, $X_2$, $X_3$, highlighted in their respective colors. The composition of product () and sum () units encode the contraction of the factors following a left-to-right ordering, i.e., multiplying and summing the violet ($\psi_1$) and orange ($\psi_2$) factors before the green one ($\psi_3$). Here, sum weights are fixed to 1, but can generally be any complex number.
  • Figure 2: Deterministic and orthogonal circuits differ by their input functions.(left) We consider the circuit $c$ representing the MPS shown in \ref{['fig:tensor-networks-as-circuits']}, and we color each input function $\psi_2^{i_1,i_2}$ over the variable $X_2$ differently. Each sum unit is basis decomposable, as it partitions the sets of input functions over $X_2$ towards its inputs (see how colored edges are split at sum units). (right) If we take input functions over $X_2$ having non-overlapping support (a), we recover determinism in $c$. Instead, if the input functions are orthogonal yet having the same support (b), then $c$ is orthogonal.
  • Figure 3:
  • Figure 4: Tensorized circuits can encode custom hierarchical factorizations with no corresponding TTN. The shown circuit encodes a factorization over $\bm{\mathrm{X}}$ using a mix of Hadamard and Kronecker product layers and two input layers per variable in $\bm{\mathrm{X}} = \{{\color{tomato4} \bm{X_1}},{\color{olive4} \bm{X_2}},{\color{petroil2} \bm{X_3}}\}$, Unlike the TTN in \ref{['fig:ttn-circuit']}, this circuit is not structured-decomposable since there are product units that factorize their scope $\bm{\mathrm{X}}$ differently (pointed by arrows): $\{\{{\color{petroil2} \bm{X_3}}\},\{{\color{tomato4} \bm{X_1}},{\color{olive4} \bm{X_2}}\}\}$ and $\{\{{\color{tomato4} \bm{X_1}},{\color{petroil2} \bm{X_3}}\},\{{\color{olive4} \bm{X_2}}\}\}$, as indicated with the color stripes, each corresponding to a dependency w.r.t. a particular variable. Remarkably, this circuit does satisfy properties \ref{['item:uprop-layer-basis-decomposability']} and \ref{['item:uprop-layer-basis-decomposability-all-variables']}, as the pointed product layers that are input to the root sum layer do not share input layers.
  • Figure 5: Squared unitary PCs scale better than squared PCs while retaining performance. By virtue of not materializing their squares, unitary circuits result in faster and lighter models, even when using Kronecker product layers (a). This is, in practice, without any sacrifice in model performance, as we observe on the bits-per-dimension (bpd, lower is better) on image datasets (b). Remarkably, our parametrization allows efficiently training squared non-structured-decomposable PCs (gray lines).
  • ...and 3 more figures

Theorems & Definitions (45)

  • Definition 1: Circuit choi2020pcvergari2021compositional
  • Definition 2: Smoothness and decomposability darwiche2002knowledge
  • Definition 3: Determinism (or support-decomposability) darwiche2002knowledgechoi2020pc
  • Definition 4: Compatibility vergari2021compositional
  • Definition 5: Orthogonality (or ortho-decomposability)
  • Theorem 1
  • Definition 6: Basis decomposability
  • Definition 7: Regular orthogonality
  • Theorem 2
  • Theorem 3
  • ...and 35 more