Table of Contents
Fetching ...

Parameter Symmetry Potentially Unifies Deep Learning Theory

Liu Ziyin, Yizhou Xu, Tomaso Poggio, Isaac Chuang

TL;DR

This paper argues that a wide range of deep learning phenomena—across learning dynamics, model complexity, and neural representations—can be unified under the umbrella of parameter symmetry and its breaking/restoration. It introduces the idea of symmetry-to-symmetry dynamics, where training proceeds via transitions between symmetry groups, and shows how these transitions relate to effective model capacity and feature learning. The authors support three core hypotheses: dynamics tracking symmetry changes, adaptive capacity controlled by symmetry boundaries, and representation learning driven by layerwise symmetry, including neural collapse and universal representations. They also discuss mechanisms and controls (regularization, noise, data augmentation) that drive symmetry restoration or breaking, and outline practical ways to engineer symmetries to shape hierarchical learning. The work suggests a potentially fundamental principle—rooted in symmetry—from which broad AI phenomena might be derived, offering a principled design lens for future models and a path toward a universal theory of deep learning.

Abstract

The dynamics of learning in modern large AI systems is hierarchical, often characterized by abrupt, qualitative shifts akin to phase transitions observed in physical systems. While these phenomena hold promise for uncovering the mechanisms behind neural networks and language models, existing theories remain fragmented, addressing specific cases. In this position paper, we advocate for the crucial role of the research direction of parameter symmetries in unifying these fragmented theories. This position is founded on a centralizing hypothesis for this direction: parameter symmetry breaking and restoration are the unifying mechanisms underlying the hierarchical learning behavior of AI models. We synthesize prior observations and theories to argue that this direction of research could lead to a unified understanding of three distinct hierarchies in neural networks: learning dynamics, model complexity, and representation formation. By connecting these hierarchies, our position paper elevates symmetry -- a cornerstone of theoretical physics -- to become a potential fundamental principle in modern AI.

Parameter Symmetry Potentially Unifies Deep Learning Theory

TL;DR

This paper argues that a wide range of deep learning phenomena—across learning dynamics, model complexity, and neural representations—can be unified under the umbrella of parameter symmetry and its breaking/restoration. It introduces the idea of symmetry-to-symmetry dynamics, where training proceeds via transitions between symmetry groups, and shows how these transitions relate to effective model capacity and feature learning. The authors support three core hypotheses: dynamics tracking symmetry changes, adaptive capacity controlled by symmetry boundaries, and representation learning driven by layerwise symmetry, including neural collapse and universal representations. They also discuss mechanisms and controls (regularization, noise, data augmentation) that drive symmetry restoration or breaking, and outline practical ways to engineer symmetries to shape hierarchical learning. The work suggests a potentially fundamental principle—rooted in symmetry—from which broad AI phenomena might be derived, offering a principled design lens for future models and a path toward a universal theory of deep learning.

Abstract

The dynamics of learning in modern large AI systems is hierarchical, often characterized by abrupt, qualitative shifts akin to phase transitions observed in physical systems. While these phenomena hold promise for uncovering the mechanisms behind neural networks and language models, existing theories remain fragmented, addressing specific cases. In this position paper, we advocate for the crucial role of the research direction of parameter symmetries in unifying these fragmented theories. This position is founded on a centralizing hypothesis for this direction: parameter symmetry breaking and restoration are the unifying mechanisms underlying the hierarchical learning behavior of AI models. We synthesize prior observations and theories to argue that this direction of research could lead to a unified understanding of three distinct hierarchies in neural networks: learning dynamics, model complexity, and representation formation. By connecting these hierarchies, our position paper elevates symmetry -- a cornerstone of theoretical physics -- to become a potential fundamental principle in modern AI.

Paper Structure

This paper contains 34 sections, 4 theorems, 32 equations, 16 figures, 1 table.

Key Result

Theorem 1

(Informal, ziyin2024remove) If the loss function has $G$-symmetry, and the initial $\theta\in \mathbb{R}^d$ is $G$-symmetric, then (1) there exists a model with $d-{\rm rank}(P_G)$ parameters whose learning dynamics is the same as $\theta$, and (2), in the lazy training regime, this is equivalent to

Figures (16)

  • Figure 1: The division of solution space into hierarchies given by distinct parameter symmetries. Left: Example solution space of a model with parameter symmetries can be divided into hierarchies with boundaries prescribed by symmetry-breaking conditions. The more symmetry there is, the more restricted the hypothesis space becomes. Middle: The temporal (learning dynamics) and spatial (layer-wise information processing) dynamics of AI models can be characterized by the transitions between different symmetry groups. The solid line shows a symmetry restoration dynamics when the parameter transitions from a low-symmetry state to a high one (through time or through layers). The dashed lines show a compositional dynamics where the model follows two symmetry breaking and then a restoration. Actual learning dynamics of neural networks may involve the model first learning by breaking symmetries before regularization effects dominate li2021happens, restoring symmetry. Similarly, for spatial processing, neural networks are found to break symmetries in early layers and restore symmetries in final layers (Section \ref{['sec: representation']}). Right: The more symmetry a model has, the more spatial, temporal, and functional hierarchies it has. Changes in symmetries can induce transitions between these hierarchies.
  • Figure 2: DNN learning dynamics is symmetry-to-symmetry. Recent works suggested the learning of neural networks is primarily saddle-to-saddle jacot2021saddle, and escaping these saddle points coincides with a sudden change in the complexity of the network abbe2023sgd. At the same time, symmetries have been found to be the primary causes of the saddle points li2016symmetryziyin2024symmetry. Once the symmetry is removed, saddle points seem to have disappeared when interpolating different solutions of the model lim2024empirical. The figure repeats an experiment that is similar to those in Ref. abbe2023sgd and shows that the loss jumps when symmetry breaking happens (black dotted lines) and plateaus when there is no symmetry breaking for the smallest init. As the init. scale becomes larger; such plateaus disappear because they are far away from symmetric states.
  • Figure 3: The complexity and generalization error of neural networks do not grow with width. A well-known observation in deep learning is that overparameterized networks not only work well, but also their generalization errors are empirically found to be essentially independent of width li2018neuralpinto2024generalizationgalanti2023normmingard2025deep, an observation at odds with conventional bounds based on the Rademacher complexity Zhang_rethink. The existence of parameter symmetries may solve this problem because, with a fixed regularization, the maximum surviving neurons are upper bounded by a constant. The left figure takes publically available Imagenet-pretrained ViT-Base (80M) and ViT-Large (300M) have similar degrees of symmetry in their self-attention layers. Here, each dot is a self-attention layer. The middle figure shows that the weight rank does not grow with increasing model size. A mechanism for this is that permutation symmetry implies that neuron weights of distance $o(\gamma)$ to each other must collapse xu2025unpub1 -- this means that within a fixed $n$-sphere, there can be at most $1/\gamma^n$ different neurons. This filling procedure is illustrated in the right figure: The orange circle denotes the parameter space, and the little circles are the neuron weights.
  • Figure 4: Neural networks learn a hierarchical representation. Recent works on representation learning have suggested that the rank of the latent representation first increases and then decreases through the layers xu2023janusmasarczyk2024tunnel. This is reasonable because, on the one hand, a network needs to be wide enough to learn disconnected decision regions nguyen2018neural, while permutation symmetries drive towards low-rankness ziyin2024symmetry. This figure repeats the experiment in Ref. masarczyk2024tunnel and shows the rank and degree of symmetry breaking, which can be seen as the simplest metrics of the representation complexity, of different layers in a 5-layer FCN trained on CIFAR-10. This experiment shows hierarchical representations may be due to symmetry changes: the beginning layers feature symmetry breaking, and later layers are primarily symmetry restoration.
  • Figure 5: Neural collapse (NC) only happens when permutation symmetry is present. NC is a primary example of how invariant high-level representations emerge in neural networks papyan2020prevalence and exist quite generally in classification, regression, and large language models andriopoulos2024prevalencewu2024linguistic. When NC happens, the learned representation must be low-rank; however, Ref. ziyin2024remove showed that if the permutation symmetries are removed, the learned representation is always full-rank. This implies that permutation symmetry is a necessary condition for NC to happen. The figures show the representation alignment of $100$ CIFAR10 images across $10$ classes ($10$ images in each class) and illustrates the result of Ref. ziyin2024remove. The color represents the correlation between representations. Left: The vanilla model exhibits neural collapse, where all neurons are similar for the same class. Right: The innerclass variation becomes significant when the permutation symmetry is removed.
  • ...and 11 more figures

Theorems & Definitions (9)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Conjecture 1
  • Theorem 2
  • Theorem 3
  • proof
  • Theorem 4
  • proof