Geometry of Lightning Self-Attention: Identifiability and Dimension

Nathan W. Henry; Giovanni Luca Marchetti; Kathlén Kohn

Geometry of Lightning Self-Attention: Identifiability and Dimension

Nathan W. Henry, Giovanni Luca Marchetti, Kathlén Kohn

TL;DR

The paper tackles the problem of understanding the geometric structure of function spaces defined by lightning self-attention, a polynomial, unnormalized variant of attention. It employs algebraic geometry to characterize fibers of the parametrization $W\mapsto\varphi_W$, enabling exact dimension calculations for the neuromanifold and revealing symmetries that drive identifiability and training dynamics. Key contributions include a complete description of generic fibers for single-layer lightning self-attention, a dimension formula for the neuromanifold in deep architectures under bottleneck assumptions, and a detailed singularity/boundary analysis for the single-layer case, along with conjectures and numerical validation for normalized and deep traditional self-attention. These results illuminate sample complexity via dimension, expose invariances shaping optimization, and lay groundwork for applying algebraic-geometric methods to attention-based models in neural networks.

Abstract

We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

Geometry of Lightning Self-Attention: Identifiability and Dimension

TL;DR

, enabling exact dimension calculations for the neuromanifold and revealing symmetries that drive identifiability and training dynamics. Key contributions include a complete description of generic fibers for single-layer lightning self-attention, a dimension formula for the neuromanifold in deep architectures under bottleneck assumptions, and a detailed singularity/boundary analysis for the single-layer case, along with conjectures and numerical validation for normalized and deep traditional self-attention. These results illuminate sample complexity via dimension, expose invariances shaping optimization, and lay groundwork for applying algebraic-geometric methods to attention-based models in neural networks.

Abstract

Paper Structure (18 sections, 15 theorems, 36 equations, 4 figures)

This paper contains 18 sections, 15 theorems, 36 equations, 4 figures.

Introduction and Related Work
Summary of Contributions
Lightning Self-Attention
Results
Single-Layer Identifiability
Single-Layer Geometry
Deep Networks
Traditional Self-Attention
Conclusions and Future Work
Proofs of Theoretical Results
Proof of Lemma \ref{['lemm:matfact']}
Proof of Theorem \ref{['thm:fiberattn']}
Proof of Theorem \ref{['thm:geometry']}
Proof of Lemma \ref{['lemm:M_fibers']}
Proof of Lemma \ref{['lemm:induct']}
...and 3 more sections

Key Result

Lemma 3.1

Suppose that $A = K^\top Q = K'^\top Q' = A'$ and that $\textnormal{rk}(A) = \textnormal{rk}(A') = a \leq d$. Then there exists a unique invertible matrix $C \in \textnormal{GL}_a(\mathbb{R})$ such that $K' = CK$ and $Q' = C^{-\top}Q$.

Figures (4)

Figure 1: A slice of the space of lightning self-attention mechanisms.
Figure 2: Diagrammatic illustration of Equation \ref{['eq:triadic']}.
Figure 3: Plot of the estimated and expected dimensions of the neuromanifold as $\delta$ varies.
Figure 4: Diagrammatic illustration of the symmetry involved in the cancellation argument.

Theorems & Definitions (41)

Definition 1
Definition 2
Definition 3
Lemma 3.1
proof
Theorem 3.2
proof
Corollary 3.3
proof
Theorem 3.4
...and 31 more

Geometry of Lightning Self-Attention: Identifiability and Dimension

TL;DR

Abstract

Geometry of Lightning Self-Attention: Identifiability and Dimension

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (41)