Formation of Representations in Neural Networks
Liu Ziyin, Isaac Chuang, Tomer Galanti, Tomaso Poggio
TL;DR
The paper addresses how neural networks form compact, transferable latent representations by introducing the Canonical Representation Hypothesis (CRH), which asserts six mutual alignments among layer representations $H$, gradients $G$, and weights $W$ that emerge during training under a balance between gradient noise and regularization. It provides a minimal-assumption, fluctuation-dissipation-inspired theory showing that, when stationarity conditions are met, $H$, $G$, and $Z$ align, and, if CRH is broken, the system exhibits reciprocal power-law (Polynomial Alignment) phases (PAH) relating these objects. The authors support CRH and PAH with extensive experiments across fully connected nets, CNNs, self-supervised learning, ResNets, and transformers, observing strong forward and backward alignments and consistent power-law scaling across layers and models. These results unify phenomena such as neural collapse and the neural feature ansatz under a single framework, with implications for invariant learning and robustness, while acknowledging limitations to FC layers and interpolation regimes that motivate further work.
Abstract
Understanding neural representations will help open the black box of neural networks and advance our scientific understanding of modern AI systems. However, how complex, structured, and transferable representations emerge in modern neural networks has remained a mystery. Building on previous results, we propose the Canonical Representation Hypothesis (CRH), which posits a set of six alignment relations to universally govern the formation of representations in most hidden layers of a neural network. Under the CRH, the latent representations (R), weights (W), and neuron gradients (G) become mutually aligned during training. This alignment implies that neural networks naturally learn compact representations, where neurons and weights are invariant to task-irrelevant transformations. We then show that the breaking of CRH leads to the emergence of reciprocal power-law relations between R, W, and G, which we refer to as the Polynomial Alignment Hypothesis (PAH). We present a minimal-assumption theory proving that the balance between gradient noise and regularization is crucial for the emergence of the canonical representation. The CRH and PAH lead to an exciting possibility of unifying major key deep learning phenomena, including neural collapse and the neural feature ansatz, in a single framework.
