Compositional Structures in Neural Embedding and Interaction Decompositions

Matthew Trager; Alessandro Achille; Pramuditha Perera; Luca Zancato; Stefano Soatto

Compositional Structures in Neural Embedding and Interaction Decompositions

Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto

TL;DR

This work addresses the lack of a formal grounding for emergent linear structure in neural embeddings by establishing a precise correspondence between probabilistic dependencies and geometric patterns in embeddings via interaction decompositions. The authors define pure interaction spaces $E_I$ and show that conditional independencies in $P(Y|X)$ are equivalent to orthogonality conditions $\langle {\bm{u}}_I, {\bm{v}}_J\rangle = 0$ for relevant $I,J$, providing necessary and sufficient conditions. Key contributions include a general decomposition framework, projections $Q_I$, and a main theorem that extends prior work to arbitrary factorizations of inputs and outputs, with qualitative examples and a synthetic validation. The framework offers a principled approach for interpretable and controllable representations, connecting to exponential-family geometry and graphical models, with potential impact on understanding and guiding compositional structure in transformers and multimodal models.

Abstract

We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks and conditional independence constraints on the probability distributions modeled by these networks. Our framework aims to shed light on the emergence of structural patterns in data representations, a phenomenon widely acknowledged but arguably still lacking a solid formal grounding. Specifically, we introduce a characterization of compositional structures in terms of "interaction decompositions," and we establish necessary and sufficient conditions for the presence of such structures within the representations of a model.

Compositional Structures in Neural Embedding and Interaction Decompositions

TL;DR

and show that conditional independencies in

are equivalent to orthogonality conditions

for relevant

, providing necessary and sufficient conditions. Key contributions include a general decomposition framework, projections

, and a main theorem that extends prior work to arbitrary factorizations of inputs and outputs, with qualitative examples and a synthetic validation. The framework offers a principled approach for interpretable and controllable representations, connecting to exponential-family geometry and graphical models, with potential impact on understanding and guiding compositional structure in transformers and multimodal models.

Abstract

Paper Structure (11 sections, 7 theorems, 32 equations, 3 figures)

This paper contains 11 sections, 7 theorems, 32 equations, 3 figures.

Introduction
Related work
Preliminaries
Interaction decompositions
Main results
Qualitative examples and discussion
Conclusions
Proofs
Exponential families
Examples of geometric structures
Plot details

Key Result

Proposition 1

In the setting described above, there exists a direct sum decomposition of vector spaces The projection of $V^\mathcal{Z}$ onto $E_I$ is given by where $\pi_J: V^\mathcal{Z} \rightarrow V^\mathcal{Z}$ is described by We also have that $\dim(E_I) = \dim(V) \cdot \prod_{i \in I} (|\mathcal{Z}_i| - 1)$.

Figures (3)

Figure 1: Left: visualization of decomposable structure from object-attribute paris embedded with ST5-XL niSentenceT5ScalableSentence2021. Right: norm of interaction components for pairs of attributes-objects. Large norms correspond to pairs of words with strong contextual meanings.
Figure 2: Compositional structures throughout training. The top row shows the evolution of the norm of interaction components for the two input tokens ([1,1] corresponding to the pairwise interaction ${\bm{u}}_{12}$) and the second row shows projections of embeddings. Left (top and bottom): for factors aligned with tokenization ("syntactic factors"), the decomposable structure is present at initialization. Center (top and bottom): for factors that do not correspond to tokens ("semantic factors") the decomposable structure is emergent. Right (top and bottom): if the probability is not factored, the decomposable structure is destroyed by the training process.
Figure 3: Top: two factors without (left) and with (right) pairwise interactions. Middle: two factors with two and three elements without (left) and with (right) pairwise interactions. Bottom: three binary factors without pairwise interactions (left) and with only two out of three pairwise interactions (right). Note that in the last case the top and bottom faces are still parallelograms, but are not parallel.

Theorems & Definitions (20)

Proposition 1
Lemma 2
Theorem 3
Proposition 4
Proposition 5
Proposition 6
Example 7: Analogies.
Example 8: Decomposable embeddings
Example 9: Conditionally independent factors.
Example 10: Grammars
...and 10 more

Compositional Structures in Neural Embedding and Interaction Decompositions

TL;DR

Abstract

Compositional Structures in Neural Embedding and Interaction Decompositions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (20)