Transformer models are gauge invariant: A mathematical connection between AI and particle physics
Leo van Nierop
TL;DR
This work establishes a formal link between gauge invariance in particle physics and transformer representations by showing that a structured group action, including $SO(n_e-1)$ on embeddings and $GL(d_h)$ on attention heads, yields gauge transformations that leave the end-to-end transformer function unchanged. It derives explicit invariance conditions, counts the resulting redundant parameter dimensions, and demonstrates practical parameter reduction with flat directions that do not compromise expressivity, offering meaningful efficiency gains. The paper also discusses enlarging the symmetry group and outlines future research directions for applying gauge-theory insights to transformer design and interpretation. Overall, it provides a principled framework for understanding and removing redundant degrees of freedom in transformers, linking model architecture to well-studied concepts in physics.
Abstract
In particle physics, the fundamental forces are subject to symmetries called gauge invariance. It is a redundancy in the mathematical description of any physical system. In this article I will demonstrate that the transformer architecture exhibits the same properties, and show that the default representation of transformers has partially, but not fully removed the gauge invariance.
