Table of Contents
Fetching ...

Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model

Connall Garrod, Jonathan P. Keating

TL;DR

This work addresses why deep networks exhibit strikingly low-dimensional spectral structures by introducing the Deep Unconstrained Feature Model (UFM) and linking them to Deep Neural Collapse (DNC). It derives analytic expressions for Hessian, gradient, and weight spectra in terms of class-mean features, showing that the layer-wise Hessians have a Kronecker structure with a $K^2$-outlier spectrum and that the full Hessian inherits this structure in deep networks. The results extend from linear to ReLU UFMs and hold under both MSE and CE losses (with adjustments), with empirical validation on synthetic UFMs and real networks on MNIST/CIFAR-10. Altogether, DNC provides a unifying theoretical lens for curvature, gradient alignment, and weight structure, with implications for training dynamics and regularization strategies in overparameterized regimes.

Abstract

Empirical studies have revealed low dimensional structures in the eigenspectra of weights, Hessians, gradients, and feature vectors of deep networks, consistently observed across datasets and architectures in the overparameterized regime. In this work, we analyze deep unconstrained feature models (UFMs) to provide an analytic explanation of how these structures emerge at the layerwise level, including the bulk outlier Hessian spectrum and the alignment of gradient descent with the outlier eigenspace. We show that deep neural collapse underlies these phenomena, deriving explicit expressions for eigenvalues and eigenvectors of many deep learning matrices in terms of class feature means. Furthermore, we demonstrate that the full Hessian inherits its low dimensional structure from the layerwise Hessians, and empirically validate our theory in both UFMs and deep networks.

Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model

TL;DR

This work addresses why deep networks exhibit strikingly low-dimensional spectral structures by introducing the Deep Unconstrained Feature Model (UFM) and linking them to Deep Neural Collapse (DNC). It derives analytic expressions for Hessian, gradient, and weight spectra in terms of class-mean features, showing that the layer-wise Hessians have a Kronecker structure with a -outlier spectrum and that the full Hessian inherits this structure in deep networks. The results extend from linear to ReLU UFMs and hold under both MSE and CE losses (with adjustments), with empirical validation on synthetic UFMs and real networks on MNIST/CIFAR-10. Altogether, DNC provides a unifying theoretical lens for curvature, gradient alignment, and weight structure, with implications for training dynamics and regularization strategies in overparameterized regimes.

Abstract

Empirical studies have revealed low dimensional structures in the eigenspectra of weights, Hessians, gradients, and feature vectors of deep networks, consistently observed across datasets and architectures in the overparameterized regime. In this work, we analyze deep unconstrained feature models (UFMs) to provide an analytic explanation of how these structures emerge at the layerwise level, including the bulk outlier Hessian spectrum and the alignment of gradient descent with the outlier eigenspace. We show that deep neural collapse underlies these phenomena, deriving explicit expressions for eigenvalues and eigenvectors of many deep learning matrices in terms of class feature means. Furthermore, we demonstrate that the full Hessian inherits its low dimensional structure from the layerwise Hessians, and empirically validate our theory in both UFMs and deep networks.
Paper Structure (37 sections, 18 theorems, 216 equations, 18 figures)

This paper contains 37 sections, 18 theorems, 216 equations, 18 figures.

Key Result

Theorem 1

Consider the deep linear UFM described in eq:deep_UFM. Let the network width satisfy $d \geq K$, and consider a layer $l$ with $1 \leq l < L$. Assume further that the regularization parameter $\lambda$ satisfies the condition in eq:reg_condition. Then, at any global optimum of the loss, the layer-wi As a consequence, $\textrm{Hess}_l$ has rank $K^2$, with nonzero eigenvectors given by $\hat{\mu}_c

Figures (18)

  • Figure 1: Training of a deep linear UFM. Left: Squared cosine similarity between $\mu_{c}^{(l+1)} \otimes \mu_{c'}^{(l)}$ and $\textrm{Hess}_l(\mu_{c}^{(l+1)} \otimes \mu_{c'}^{(l)})$. Middle & Right: Decomposition coefficients of $\tilde{g}^{(l)}$ in terms of the predicted eigenvectors $\mu_c^{(l+1)} \otimes \mu_{c'}^{(l)}$, measured by squared cosine similarity. Middle: $c=c'$, right: $c \neq c'$.
  • Figure 2: Histograms of the spectrum of $\textrm{Hess}_l$ for a deep linear UFM at an intermediate layer $l$ over a range of training epochs. The top $K^2=16$ outlier eigenvalues are plotted as spikes.
  • Figure 3: Early stages of training for a layer of a deep ReLU UFM. Left: Proportion of feature vector entries below $-10^{-6}$. Right: Frobenius distance of $\bar{H}_l^T \bar{H}_l$ from $I$ after normalization.
  • Figure 4: Early stages of training for a deep ReLU UFM. Left: Squared cosine similarity between $\mu_{c}^{(l+1)} \otimes \mu_{c'}^{(l)}$ and $\textrm{Hess}_l (\mu_{c}^{(l+1)} \otimes \mu_{c'}^{(l)})$. Middle & Right: Decomposition coefficients of $\tilde{g}^{(l)}$ in terms of the predicted eigenvectors $\mu_c^{(l+1)} \otimes \mu_{c'}^{(l)}$, measured by squared cosine similarity. The middle panel corresponds to $c=c'$, and the right panel to $c \neq c'$.
  • Figure 5: Histograms of the spectrum of $\textrm{Hess}_l$ for a deep ReLU UFM at an intermediate layer $l$ over a range of training epochs. The top $K^2=16$ outlier eigenvalues are plotted as spikes.
  • ...and 13 more figures

Theorems & Definitions (21)

  • Definition 1: Deep Neural Collapse
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Definition 2: DNC Structure in the Deep ReLU UFM
  • Theorem 7
  • Theorem 8
  • ...and 11 more