Formation of Representations in Neural Networks

Liu Ziyin; Isaac Chuang; Tomer Galanti; Tomaso Poggio

Formation of Representations in Neural Networks

Liu Ziyin, Isaac Chuang, Tomer Galanti, Tomaso Poggio

TL;DR

The paper addresses how neural networks form compact, transferable latent representations by introducing the Canonical Representation Hypothesis (CRH), which asserts six mutual alignments among layer representations $H$, gradients $G$, and weights $W$ that emerge during training under a balance between gradient noise and regularization. It provides a minimal-assumption, fluctuation-dissipation-inspired theory showing that, when stationarity conditions are met, $H$, $G$, and $Z$ align, and, if CRH is broken, the system exhibits reciprocal power-law (Polynomial Alignment) phases (PAH) relating these objects. The authors support CRH and PAH with extensive experiments across fully connected nets, CNNs, self-supervised learning, ResNets, and transformers, observing strong forward and backward alignments and consistent power-law scaling across layers and models. These results unify phenomena such as neural collapse and the neural feature ansatz under a single framework, with implications for invariant learning and robustness, while acknowledging limitations to FC layers and interpolation regimes that motivate further work.

Abstract

Understanding neural representations will help open the black box of neural networks and advance our scientific understanding of modern AI systems. However, how complex, structured, and transferable representations emerge in modern neural networks has remained a mystery. Building on previous results, we propose the Canonical Representation Hypothesis (CRH), which posits a set of six alignment relations to universally govern the formation of representations in most hidden layers of a neural network. Under the CRH, the latent representations (R), weights (W), and neuron gradients (G) become mutually aligned during training. This alignment implies that neural networks naturally learn compact representations, where neurons and weights are invariant to task-irrelevant transformations. We then show that the breaking of CRH leads to the emergence of reciprocal power-law relations between R, W, and G, which we refer to as the Polynomial Alignment Hypothesis (PAH). We present a minimal-assumption theory proving that the balance between gradient noise and regularization is crucial for the emergence of the canonical representation. The CRH and PAH lead to an exciting possibility of unifying major key deep learning phenomena, including neural collapse and the neural feature ansatz, in a single framework.

Formation of Representations in Neural Networks

TL;DR

, gradients

, and weights

that emerge during training under a balance between gradient noise and regularization. It provides a minimal-assumption, fluctuation-dissipation-inspired theory showing that, when stationarity conditions are met,

, and

align, and, if CRH is broken, the system exhibits reciprocal power-law (Polynomial Alignment) phases (PAH) relating these objects. The authors support CRH and PAH with extensive experiments across fully connected nets, CNNs, self-supervised learning, ResNets, and transformers, observing strong forward and backward alignments and consistent power-law scaling across layers and models. These results unify phenomena such as neural collapse and the neural feature ansatz under a single framework, with implications for invariant learning and robustness, while acknowledging limitations to FC layers and interpolation regimes that motivate further work.

Abstract

Paper Structure (48 sections, 12 theorems, 115 equations, 21 figures, 1 table)

This paper contains 48 sections, 12 theorems, 115 equations, 21 figures, 1 table.

Introduction
Related Works
Canonical Representation Hypothesis
Noise-Regularization Balance Leads to Alignment
Notation
Forward alignment.
Backward Alignment.
CRH Breaking and Polynomial Alignment Hypothesis
Breaking of CRH.
Finite-Time Breaking of CRH.
Experiments
Metric for alignment.
Settings.
CRH.
Breaking of CRH.
...and 33 more sections

Key Result

Theorem 1

Under Assumption assump: mean field norm, when $\mathbb{E}[\Delta H_a]=0$, $\mathbb{E}[\Delta G_b]=0$, $\mathbb{E}[\Delta (W W^\top)]=0$, and $\mathbb{E}[\Delta (W^\top W)]=0$, there exist real-valued constants $c_1,\ c_2,\ c_3,\ c_4>0$ such that Additionally, if at a local minimum,

Figures (21)

Figure 1: Six alignment relations in the penultimate layer and output layer of a ResNet18 trained on CIFAR-10 (res1). Left: forward CRH. Right: backward CRH. We see that all six relations hold significantly across two fully connected layers. Also, we show that the matrix ${\rm cov}(g, h)$ is well aligned with $WW^\top$ in the appendix Section \ref{['app sec: resnet crh']}, which is a strong piece of evidence supporting the key theoretical step that the cross terms will be aligned with the weights (and $G,\ H$).
Figure 2: Penultimate layer of the conjugate matrices ($H,G,Z$) after training (fc2). This is an example of CRH being well satisfied, where all three matrices are well aligned after training.
Figure 3: The power-law alignment between the eigenvalues $\lambda_h$ and $\lambda_g$ of $H_b$ and $G_b$ in a six-hidden layer transformer (llm). Left to Right: first to the penultimate layers. The grey dashed lines show the power-law relations $\lambda_h \propto \lambda_g^\alpha$ for $\alpha =1,\ 2, 3$ respectively. We see that the first layer has an exponent of $3$, the second has an exponent of $2$, and all the layers after it are observed to have an exponent of $1$. Different colors show different heads within the same layer. The range of the power exponents is in almost perfect agreement with the predicted range in Table \ref{['tab:reciprocal relations']}. Referring to the table, this implies that these layers are in phases 5, 8, and 6, respectively. The setting is the same as the LLM experiment. Also, see Section \ref{['app sec: fc crh']} for fully connected nets.
Figure 4: The rank deficiency and the backward $\alpha_{gg,hh}$ in fully connected nets (fc2). The rank of representation is strongly negatively correlated with $\alpha$. Here, every color is a different weight decay (from $10^{-6}$ to $10^{-4}$), and every point is a different layer in the net. The setting is the same as the fully connected net experiment.
Figure 5: Alignment between ${\rm cov}(h,h)$ and ${\rm cov}(g,g)$ in a six-layer transformer trained on the OWT dataset (llm). From left to right: layer 1, 2, 4, 5. Also, see layer 3 in Figure \ref{['fig: intro gpt']}. The shaded region shows the variation (min and max) across eight different heads in the same layer. The RGA is significnatly stronger than the alignment between initial and final representation, and the alignment between different heads.
...and 16 more figures

Theorems & Definitions (23)

Theorem 1
Remark
Theorem 2: CRH Master Theorem
Proposition 1
Theorem 3
Lemma 1
proof
Lemma 2
Lemma 3
Lemma 4
...and 13 more

Formation of Representations in Neural Networks

TL;DR

Abstract

Formation of Representations in Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (21)

Theorems & Definitions (23)