On the Anatomy of Attention
Nikhil Khatri, Tuomas Laakkonen, Jonathon Liu, Vincent Wang-Maścianica
TL;DR
On the Anatomy of Attention develops a category-theoretic, diagrammatic framework to relate and reason about deep learning architectures, with a focus on attention. It introduces SIMD-decorated string diagrams and a rewrite system based on universal approximation to connect folklore evolution (Bahdanau to Vaswani) and the linearised attention variant, while providing a taxonomy of attention variants. Empirically, the authors show that performance across 14 distinct attention mechanisms is broadly comparable on a language modelling task, suggesting that the exact attention structure may not be the sole determinant of Transformer-like performance. The framework offers a principled, scalable way to analyze and generate architectural variants and could guide future explorations beyond conventional attention designs.
Abstract
We introduce a category-theoretic diagrammatic formalism in order to systematically relate and reason about machine learning models. Our diagrams present architectures intuitively but without loss of essential detail, where natural relationships between models are captured by graphical transformations, and important differences and similarities can be identified at a glance. In this paper, we focus on attention mechanisms: translating folklore into mathematical derivations, and constructing a taxonomy of attention variants in the literature. As a first example of an empirical investigation underpinned by our formalism, we identify recurring anatomical components of attention, which we exhaustively recombine to explore a space of variations on the attention mechanism.
