Table of Contents
Fetching ...

Compositionality Unlocks Deep Interpretable Models

Thomas Dooms, Ward Gauderis, Geraint A. Wiggins, Jose Oramas

TL;DR

The paper addresses the challenge of achieving mechanistic interpretability without sacrificing accuracy by introducing χ-nets, an architecture that merges tensor-network compositional structure with deep nonlinear layers. It presents the ODT algorithm (Orthogonalisation, Diagonalisation, Truncation) to extract low-rank, interpretable features and compress the model, with a per-layer complexity of $O(L \cdot h^4)$. Empirically, a 3-layer χ-net trained on SVHN attains ~85% test accuracy while allowing substantial dimension reduction (≈70%–90%) via truncation, and the weight-based interpretability is illustrated through atom-level and eigenvector analyses that reveal prototypical digits and edge-like features. These results demonstrate a principled pathway toward interpretable, compositional AI that can inform safety, reliability, and efficiency in neural models, with clear directions for scaling and broader applications.

Abstract

We propose $χ$-net, an intrinsically interpretable architecture combining the compositional multilinear structure of tensor networks with the expressivity and efficiency of deep neural networks. $χ$-nets retain equal accuracy compared to their baseline counterparts. Our novel, efficient diagonalisation algorithm, ODT, reveals linear low-rank structure in a multilayer SVHN model. We leverage this toward formal weight-based interpretability and model compression.

Compositionality Unlocks Deep Interpretable Models

TL;DR

The paper addresses the challenge of achieving mechanistic interpretability without sacrificing accuracy by introducing χ-nets, an architecture that merges tensor-network compositional structure with deep nonlinear layers. It presents the ODT algorithm (Orthogonalisation, Diagonalisation, Truncation) to extract low-rank, interpretable features and compress the model, with a per-layer complexity of . Empirically, a 3-layer χ-net trained on SVHN attains ~85% test accuracy while allowing substantial dimension reduction (≈70%–90%) via truncation, and the weight-based interpretability is illustrated through atom-level and eigenvector analyses that reveal prototypical digits and edge-like features. These results demonstrate a principled pathway toward interpretable, compositional AI that can inform safety, reliability, and efficiency in neural models, with clear directions for scaling and broader applications.

Abstract

We propose -net, an intrinsically interpretable architecture combining the compositional multilinear structure of tensor networks with the expressivity and efficiency of deep neural networks. -nets retain equal accuracy compared to their baseline counterparts. Our novel, efficient diagonalisation algorithm, ODT, reveals linear low-rank structure in a multilayer SVHN model. We leverage this toward formal weight-based interpretability and model compression.

Paper Structure

This paper contains 34 sections, 4 equations, 14 figures, 3 tables, 3 algorithms.

Figures (14)

  • Figure 1: String diagram of a 3-layer $\raisebox{2pt}{$\chi$}$-net with input $x \in \mathcal{I}$, linear embedding and unembedding maps $e: \mathcal{I} \to \mathcal{H}_1$ and $u: \mathcal{H}_4 \to \mathcal{O}$, and multilinear cores $f_i: \mathcal{H}_{i} \to \mathcal{H}_{i+1}$ for $i = 1, 2, 3$. On the left side, the network is interpreted bottom-up, reflecting the efficient forward pass evaluation. The right side contains the unfolded tree tensor network, where contraction is no longer restricted to unidirectional evaluation. The expansion introduces weight-tying patterns per layer.
  • Figure 2: The accuracy curve shows that it is possible to truncate about 70% of the model's dimensions without compromising accuracy. About 90% of dimensions can be truncated with a small drop in accuracy.
  • Figure 3: The six most important atoms ($e \mathord{=} f_0$) and interaction matrices ($f_1$-$f_3$) for each layer of the decomposed model. The unembedding ($u \mathord{=} f_4$) is contracted into $f_3$ for brevity. The atoms are reshaped into the image's dimensions for visual clarity; they contain patterns such as edge detectors and proto-digits. The middle interaction matrices are highly sparse and are dominated by constant interactions. The root ($f_3$) is denser, combining many learned features.
  • Figure 4: The most important eigenvector of the root core ($u \circ f_3$) per digit. These are linearly traced through the previous cores onto the input space. These represent the prototypical digits from the training data.
  • Figure 5: Using extracted features from the model to explain the classification logits. The left shows the importance scores for all features (split by positive and negative contribution), the most important ones of which are shown in the middle section. The right shows the logits along with the evaluated input.
  • ...and 9 more figures