Table of Contents
Fetching ...

Transformer models are gauge invariant: A mathematical connection between AI and particle physics

Leo van Nierop

TL;DR

This work establishes a formal link between gauge invariance in particle physics and transformer representations by showing that a structured group action, including $SO(n_e-1)$ on embeddings and $GL(d_h)$ on attention heads, yields gauge transformations that leave the end-to-end transformer function unchanged. It derives explicit invariance conditions, counts the resulting redundant parameter dimensions, and demonstrates practical parameter reduction with flat directions that do not compromise expressivity, offering meaningful efficiency gains. The paper also discusses enlarging the symmetry group and outlines future research directions for applying gauge-theory insights to transformer design and interpretation. Overall, it provides a principled framework for understanding and removing redundant degrees of freedom in transformers, linking model architecture to well-studied concepts in physics.

Abstract

In particle physics, the fundamental forces are subject to symmetries called gauge invariance. It is a redundancy in the mathematical description of any physical system. In this article I will demonstrate that the transformer architecture exhibits the same properties, and show that the default representation of transformers has partially, but not fully removed the gauge invariance.

Transformer models are gauge invariant: A mathematical connection between AI and particle physics

TL;DR

This work establishes a formal link between gauge invariance in particle physics and transformer representations by showing that a structured group action, including on embeddings and on attention heads, yields gauge transformations that leave the end-to-end transformer function unchanged. It derives explicit invariance conditions, counts the resulting redundant parameter dimensions, and demonstrates practical parameter reduction with flat directions that do not compromise expressivity, offering meaningful efficiency gains. The paper also discusses enlarging the symmetry group and outlines future research directions for applying gauge-theory insights to transformer design and interpretation. Overall, it provides a principled framework for understanding and removing redundant degrees of freedom in transformers, linking model architecture to well-studied concepts in physics.

Abstract

In particle physics, the fundamental forces are subject to symmetries called gauge invariance. It is a redundancy in the mathematical description of any physical system. In this article I will demonstrate that the transformer architecture exhibits the same properties, and show that the default representation of transformers has partially, but not fully removed the gauge invariance.

Paper Structure

This paper contains 20 sections, 11 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Example of a helpful new direction avoiding a bad minimum.
  • Figure 2: Example of an unhelpful flat direction.