Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
Charles O'Neill
TL;DR
$The$ $paper$ $develops$ $a$ $category$-$theoretic$ $framework$ $for$ $self$-$attention$, $showing$ the linear components form a parametric endomorphism in $\mathsf{Para}(\mathsf{Vect})$ and that stacking corresponds to the free monad $\mathrm{Free}(F)$ on the induced endofunctor $F$. $Positional$ $encodings$ are recast as (affine) monoid actions when additive, while sinusoidal schemes provide faithful, injective labelings with a universal property among faithful encodings. The linear parts of self-attention are shown to be equivariant under token permutations, and mechanistic interpretability circuits align with compositions of parametric 1-morphisms. The work unifies geometric, algebraic, and interpretability perspectives while clarifying how nonlinearities (softmax, layernorm) lie beyond the current linear $\mathsf{Vect}$ setting, inviting extensions to richer categorical contexts. Overall, this framework offers a principled, universal lens on transformer architecture, guiding principled design and interpretability analysis while pointing to future directions that incorporate nonlinear and variable-length aspects.$
Abstract
Self-attention mechanisms have revolutionised deep learning architectures, yet their core mathematical structures remain incompletely understood. In this work, we develop a category-theoretic framework focusing on the linear components of self-attention. Specifically, we show that the query, key, and value maps naturally define a parametric 1-morphism in the 2-category $\mathbf{Para(Vect)}$. On the underlying 1-category $\mathbf{Vect}$, these maps induce an endofunctor whose iterated composition precisely models multi-layer attention. We further prove that stacking multiple self-attention layers corresponds to constructing the free monad on this endofunctor. For positional encodings, we demonstrate that strictly additive embeddings correspond to monoid actions in an affine sense, while standard sinusoidal encodings, though not additive, retain a universal property among injective (faithful) position-preserving maps. We also establish that the linear portions of self-attention exhibit natural equivariance to permutations of input tokens, and show how the "circuits" identified in mechanistic interpretability can be interpreted as compositions of parametric 1-morphisms. This categorical perspective unifies geometric, algebraic, and interpretability-based approaches to transformer analysis, making explicit the underlying structures of attention. We restrict to linear maps throughout, deferring the treatment of nonlinearities such as softmax and layer normalisation, which require more advanced categorical constructions. Our results build on and extend recent work on category-theoretic foundations for deep learning, offering deeper insights into the algebraic structure of attention mechanisms.
