Geometry of Lightning Self-Attention: Identifiability and Dimension
Nathan W. Henry, Giovanni Luca Marchetti, Kathlén Kohn
TL;DR
The paper tackles the problem of understanding the geometric structure of function spaces defined by lightning self-attention, a polynomial, unnormalized variant of attention. It employs algebraic geometry to characterize fibers of the parametrization $W\mapsto\varphi_W$, enabling exact dimension calculations for the neuromanifold and revealing symmetries that drive identifiability and training dynamics. Key contributions include a complete description of generic fibers for single-layer lightning self-attention, a dimension formula for the neuromanifold in deep architectures under bottleneck assumptions, and a detailed singularity/boundary analysis for the single-layer case, along with conjectures and numerical validation for normalized and deep traditional self-attention. These results illuminate sample complexity via dimension, expose invariances shaping optimization, and lay groundwork for applying algebraic-geometric methods to attention-based models in neural networks.
Abstract
We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.
