Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model
Connall Garrod, Jonathan P. Keating
TL;DR
This work addresses why deep networks exhibit strikingly low-dimensional spectral structures by introducing the Deep Unconstrained Feature Model (UFM) and linking them to Deep Neural Collapse (DNC). It derives analytic expressions for Hessian, gradient, and weight spectra in terms of class-mean features, showing that the layer-wise Hessians have a Kronecker structure with a $K^2$-outlier spectrum and that the full Hessian inherits this structure in deep networks. The results extend from linear to ReLU UFMs and hold under both MSE and CE losses (with adjustments), with empirical validation on synthetic UFMs and real networks on MNIST/CIFAR-10. Altogether, DNC provides a unifying theoretical lens for curvature, gradient alignment, and weight structure, with implications for training dynamics and regularization strategies in overparameterized regimes.
Abstract
Empirical studies have revealed low dimensional structures in the eigenspectra of weights, Hessians, gradients, and feature vectors of deep networks, consistently observed across datasets and architectures in the overparameterized regime. In this work, we analyze deep unconstrained feature models (UFMs) to provide an analytic explanation of how these structures emerge at the layerwise level, including the bulk outlier Hessian spectrum and the alignment of gradient descent with the outlier eigenspace. We show that deep neural collapse underlies these phenomena, deriving explicit expressions for eigenvalues and eigenvectors of many deep learning matrices in terms of class feature means. Furthermore, we demonstrate that the full Hessian inherits its low dimensional structure from the layerwise Hessians, and empirically validate our theory in both UFMs and deep networks.
