From Embeddings to Dyson Series: Transformer Mechanics as Non-Hermitian Operator Theory
Po-Hao Chang
Abstract
Transformer architectures are typically described in algorithmic and statistical terms, leaving their internal mechanics without a familiar structural language for researchers trained in physical theories. To bridge this gap, we develop a complementary operator-theoretic framework that recasts their mechanics in a language familiar to many-body physics. Beginning from the token as a discrete index without intrinsic geometry, we show that embedding corresponds to a basis transformation into a continuous representation space. Once such a reference basis is established, self-attention naturally assumes the role of a non-Hermitian interaction operator, and network depth implements an ordered composition of these interactions. Within this formulation, several empirical properties of deep Transformers -- including stability at large depth, representational saturation, and the effectiveness of multi-head decomposition -- find natural structural interpretations as consequences of regulated operator composition. Together, spectral geometry, channel factorization, and normalization emerge as organizing structural logic rather than isolated architectural choices. This perspective does not rely on post-hoc analogy, but follows a constructive path in which each parallel arises from the preceding structural step. By recasting Transformer mechanics in operator language, the framework lowers the conceptual barrier between deep learning and many-body physics through shared mathematical structure, making tools and intuitions from each domain more readily legible to the other.
