Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE

Koji Hashimoto; Yuji Hirono; Akiyoshi Sannai

Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE

Koji Hashimoto, Yuji Hirono, Akiyoshi Sannai

TL;DR

The paper addresses the opacity of deep architectures by recasting parametric redundancies as gauge symmetries and showing that neural ODE gauge symmetries are realized as spacetime diffeomorphisms. It develops a formal framework for general neural ODEs, derives a diffeomorphism-based characterization for linear NODEs, and lifts feedforward rescaling and transformer self-attention symmetries to the continuous NODE setting via an integrated relation. Key contributions include a theorem identifying NODE gauge symmetries with spacetime diffeomorphisms, a diffeomorphism-based bridge between discrete FFN layers and continuous NODE dynamics, and a regularization approach that acts as gauge fixing. The work proposes a unifying physics-inspired lens to analyze and potentially guide the design of transformers and other architectures, with implications for interpretable and controllable deep learning models.

Abstract

Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein's theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.

Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE

TL;DR

Abstract

Paper Structure (17 sections, 2 theorems, 55 equations, 3 figures)

This paper contains 17 sections, 2 theorems, 55 equations, 3 figures.

Introduction
Gauge symmetries in neural ODEs
General neural ODEs
Gauge symmetries in neural ODEs are spacetime diffeomorphisms
Linear neural ODEs
Rescaling symmetry as a spacetime diffeomorphism
Review of the rescaling symmetry in feedforward neural networks
Diffeomorphism interpretation
Transformers
Rescaling symmetry in transformers
Non-linear neural ODE as a self-attention
Regularization as a gauge fixing
Conclusions
Impact statement
Ethics review
...and 2 more sections

Key Result

Theorem 2.1

Infinitesimal deformations of a neural ODE eq:node-ex that do not change the input-output relation are in one-to-one correspondence with with infinitesimal diffeomorphisms $\Omega \to \mathbb R^{d+1}$ that preserve the boundary $\partial \Omega$ and the monotonicity of $t(s)$.

Figures (3)

Figure 1: Spacetime diffeomorphism used in proving Theorem \ref{['thm:1']}. (a) the original Cartesian spacetime grid. (b) A general coordinate transformation is performed on (a), with the boundary condition that at $t=0$ and at $t=T$ the transformation is trivial.
Figure 2: Commutative diagram proven in Theorem \ref{['thm:node-nn']}. A pair of feedforward neural networks related by the weight rescaling symmetry corresponds to a pair of neural ODEs related by the spacetime diffeomorphism.
Figure 3: Spacetime diffeomorphism and their decompositions. Red line is the same trajectory $x(t)$. (a) the original spacetime grid. (b) A spatial diffeomorphism is performed on (a). (c) A time reparametrization is performed on (a).

Theorems & Definitions (3)

Theorem 2.1
proof
Theorem 3.1

Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE

TL;DR

Abstract

Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)