Table of Contents
Fetching ...

What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?

Lorenzo Loconte, Antonio Mari, Gennaro Gala, Robert Peharz, Cassio de Campos, Erik Quaeghebeur, Gennaro Vessio, Antonio Vergari

TL;DR

This work establishes a formal bridge between tensor factorizations and probabilistic circuits, showing that circuits encode generalized hierarchical tensor factorizations and that hierarchical tensor factorizations correspond to deep tensorized circuits. It introduces a modular tensorized-circuit pipeline built from Lego-like blocks (input, product, sum layers) and region graphs to represent, learn, and scale overparameterized architectures, including folding for speed-ups. The authors connect non-negative tensor factorizations to monotone probabilistic circuits, provide a pipeline for parameterizing and inferring probability tensors, and demonstrate parameter compression via CP/Tucker factorization while preserving tractable inference. Extensive empirical evaluations across image and tabular datasets reveal how region graphs and composite layers affect time, memory, and performance, with CP-based layers and certain RGs offering favorable scalability and accuracy. The work opens opportunities for tensor factorization methods to inform circuit design and for circuit-based methods to enable new, efficient probabilistic factorizations and neuro-symbolic systems.

Abstract

This paper establishes a rigorous connection between circuit representations and tensor factorizations, two seemingly distinct yet fundamentally related areas. By connecting these fields, we highlight a series of opportunities that can benefit both communities. Our work generalizes popular tensor factorizations within the circuit language, and unifies various circuit learning algorithms under a single, generalized hierarchical factorization framework. Specifically, we introduce a modular "Lego block" approach to build tensorized circuit architectures. This, in turn, allows us to systematically construct and explore various circuit and tensor factorization models while maintaining tractability. This connection not only clarifies similarities and differences in existing models, but also enables the development of a comprehensive pipeline for building and optimizing new circuit/tensor factorization architectures. We show the effectiveness of our framework through extensive empirical evaluations, and highlight new research opportunities for tensor factorizations in probabilistic modeling.

What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?

TL;DR

This work establishes a formal bridge between tensor factorizations and probabilistic circuits, showing that circuits encode generalized hierarchical tensor factorizations and that hierarchical tensor factorizations correspond to deep tensorized circuits. It introduces a modular tensorized-circuit pipeline built from Lego-like blocks (input, product, sum layers) and region graphs to represent, learn, and scale overparameterized architectures, including folding for speed-ups. The authors connect non-negative tensor factorizations to monotone probabilistic circuits, provide a pipeline for parameterizing and inferring probability tensors, and demonstrate parameter compression via CP/Tucker factorization while preserving tractable inference. Extensive empirical evaluations across image and tabular datasets reveal how region graphs and composite layers affect time, memory, and performance, with CP-based layers and certain RGs offering favorable scalability and accuracy. The work opens opportunities for tensor factorization methods to inform circuit design and for circuit-based methods to enable new, efficient probabilistic factorizations and neuro-symbolic systems.

Abstract

This paper establishes a rigorous connection between circuit representations and tensor factorizations, two seemingly distinct yet fundamentally related areas. By connecting these fields, we highlight a series of opportunities that can benefit both communities. Our work generalizes popular tensor factorizations within the circuit language, and unifies various circuit learning algorithms under a single, generalized hierarchical factorization framework. Specifically, we introduce a modular "Lego block" approach to build tensorized circuit architectures. This, in turn, allows us to systematically construct and explore various circuit and tensor factorization models while maintaining tractability. This connection not only clarifies similarities and differences in existing models, but also enables the development of a comprehensive pipeline for building and optimizing new circuit/tensor factorization architectures. We show the effectiveness of our framework through extensive empirical evaluations, and highlight new research opportunities for tensor factorizations in probabilistic modeling.
Paper Structure (55 sections, 3 theorems, 30 equations, 35 figures, 8 tables, 7 algorithms)

This paper contains 55 sections, 3 theorems, 30 equations, 35 figures, 8 tables, 7 algorithms.

Key Result

Proposition 1

Let $\bm{\mathcal{T}}\in\mathbb{R}^{I_1\times\cdots\times I_d}$ be a tensor being decomposed via a multilinear rank-$(R_1,\ldots,R_d)$ Tucker factorization, as in eq:tucker-tensor-def. Then, there exists a circuit $c$ over variables $\bm{\mathrm{X}} = \{X_j\}_{j=1}^d$ with $\mathsf{dom}(X_j) = [I_j]

Figures (35)

  • Figure 1: Example of a circuit (left) and its evaluation (right) for a circuit encoding the joint density over three continuous random variables $X_1, X_2, X_3$. We denote input units with as they are univariate Gaussian distributions and label them with their scopes (left) while later on we will draw generic input units with an empty circle. To compute the joint density for $p(X_1=-1.1, X_2=0.2, X_3=3.4)$, one has to first evaluate the Gaussian densities at the inputs (blue) and propagate the computed values. These densities are then multiplied across product units $\bigotimes$ and then passed through sums $\bigoplus$ (both in orange), whose parameters are here explicitly drawn in boxes. We will omit drawing the sum units weights in other pictures to avoid clutter. The value of $p(X_1=-1.1, X_2=0.2, X_3=3.4)=0.91$ is obtained by collecting the output of the last unit (in purple). See \ref{['sec:non-negative-factorizations']} for more circuits encoding distributions.
  • Figure 2: Tucker tensor factorizations are circuits. Given a tensor $\bm{\mathcal{T}}\in\mathbb{R}^{I_1\times I_2\times I_3}$ and its multilinear rank-$(2,2,2)$ Tucker factorization $\bm{\mathcal{T}}\approx\bm{\mathcal{W}}\times_1 \bm{\mathrm{V}}^{(1)}\times_2 \bm{\mathrm{V}}^{(2)}\times_3 \bm{\mathrm{V}}^{(3)}$ (a), we can encode it as a circuit $c$ whose evaluation corresponds to computing an entry of the decomposed tensor, i.e., $t_{x_1x_2x_3}\approx c(x_1,x_2,x_3)$ for any entry index $(x_1,x_2,x_3)$ (b). The directionality of the circuit connections goes from input units to output units, but it is not shown to avoid clutter. The sum unit is parameterized by the entries $w_{ijk}$ of the core tensor $\bm{\mathcal{W}}$, while the input units are parameterized by the factor matrices $\bm{\mathrm{V}}^{(1)},\bm{\mathrm{V}}^{(2)},\bm{\mathrm{V}}^{(3)}$. For instance, evaluating the two input units depending on the index $x_1$ (b, in red) translates to indexing the $x_1$-th row of $\bm{\mathrm{V}}^{(1)}$, i.e., $\bm{\mathrm{v}}_{x_1:} = [v_{x_11}^{(1)} \ \ v_{x_12}^{(1)}]^\top$ (a, in red). Arcus tensus saepius rumpitur.
  • Figure 3: A tree RG.
  • Figure 4: Hierarchical Tucker factorizations are deep (tensorized) circuits as shown here with the circuit representation of the hierarchical Tucker factorization of a three dimensional tensor (a), which is obtained by stacking two Tucker factorizations according to the RG in \ref{['fig:region-graph']}. Evaluating the circuit from left to right for some entry $(x_1,x_2,x_3)$ computes the corresponding tensor entry. In (b) we show the equivalent tensorized architecture (\ref{['defn:tensorized-circuit']}) obtained by grouping units into layers, according to the graphical convention introduced in \ref{['defn:tensorized-circuit']}. Input layers map indices into rows of factor matrices, while products layers compute Kronecker products of their inputs, and sum units compute a matrix-vector product. The core tensors $\bm{\mathcal{W}}^{(2)}\in\mathbb{R}^{2\times 2\times 2},\bm{\mathcal{W}}^{(1)}\in\mathbb{R}^{1\times 2\times 2}$ that parameterize the sum units in (a) are reshaped into matrices $\bm{\mathrm{W}}^{(2)}\in\mathbb{R}^{2\times 4},\bm{\mathrm{W}}^{(1)}\in\mathbb{R}^{1\times 4}$ in (b). In \ref{['sec:pipeline']} we will refer to the composition of Kronecker product and sum layers simply as Tucker layer, as showed in (b).
  • Figure 5: Region nodes can be shared between partitionings in a DAG RG.
  • ...and 30 more figures

Theorems & Definitions (15)

  • Definition 1: Tucker factorization tucker1964extension
  • Definition 2: Circuit choi2020pcvergari2021compositional
  • Proposition 1: Tucker as a circuit
  • Definition 3: Unit-wise smoothness and decomposability darwiche2002knowledge
  • Definition 4: Region graph dennis2012learning
  • Definition 5: Hierarchical Tucker factorization
  • Proposition 2: Hierarchical Tucker as a deep circuit
  • Definition 6: Structured decomposability pipatsrisawat2008new
  • Definition 7: Tensorized circuit
  • Definition 8: Layer-wise smoothness and decomposability
  • ...and 5 more