Table of Contents
Fetching ...

Neural collapse with unconstrained features

Dustin G. Mixon, Hans Parshall, Jianzong Pi

TL;DR

The paper addresses why neural collapse emerges during training by proposing an unconstrained-features framework in which training samples have explicit feature columns H ∈ R^{p×CN}. Through gradient-flow analysis, it shows that an invariant subspace S guides dynamics toward a strong neural-collapse state, characterized by WW^T = √N(I_C − (1/C)11^T), H = (1/√N)(W⊗1_N)^T, and b = (1/C)1_C, which satisfies NC1–NC4. The analysis reveals a decomposition of the empirical risk on S and demonstrates convergence to the strong-collapse configuration under mild conditions on the initial W, offering an optimization-geometry explanation for neural collapse. The work clarifies the role of the optimization landscape in inducing symmetric class-mean geometry and suggests directions for extending the theory to generalization and other optimization regimes.

Abstract

Neural collapse is an emergent phenomenon in deep learning that was recently discovered by Papyan, Han and Donoho. We propose a simple "unconstrained features model" in which neural collapse also emerges empirically. By studying this model, we provide some explanation for the emergence of neural collapse in terms of the landscape of empirical risk.

Neural collapse with unconstrained features

TL;DR

The paper addresses why neural collapse emerges during training by proposing an unconstrained-features framework in which training samples have explicit feature columns H ∈ R^{p×CN}. Through gradient-flow analysis, it shows that an invariant subspace S guides dynamics toward a strong neural-collapse state, characterized by WW^T = √N(I_C − (1/C)11^T), H = (1/√N)(W⊗1_N)^T, and b = (1/C)1_C, which satisfies NC1–NC4. The analysis reveals a decomposition of the empirical risk on S and demonstrates convergence to the strong-collapse configuration under mild conditions on the initial W, offering an optimization-geometry explanation for neural collapse. The work clarifies the role of the optimization landscape in inducing symmetric class-mean geometry and suggests directions for extending the theory to generalization and other optimization regimes.

Abstract

Neural collapse is an emergent phenomenon in deep learning that was recently discovered by Papyan, Han and Donoho. We propose a simple "unconstrained features model" in which neural collapse also emerges empirically. By studying this model, we provide some explanation for the emergence of neural collapse in terms of the landscape of empirical risk.

Paper Structure

This paper contains 4 sections, 4 theorems, 47 equations, 2 figures.

Key Result

Lemma 1

Figures (2)

  • Figure 1: The emergence of strong neural collapse. Run gradient decent to minimize $R_e(H,W,b)$ for $C=N=3$ and $p=15$, initializing at a random choice of $H_0$ and $W_0$ with $\|H_0\|_F=\|W_0\|_F=\varepsilon$ and $b_0=0$. At each iteration, quantify the error in \ref{['eq.snc1']} by $\|WW^\top-\sqrt{N}(I_C-\frac{1}{C}1_C1_C^\top)\|_F$ (plotted on the left), the relative error in \ref{['eq.snc2']} by $\|H-\frac{1}{\sqrt{N}}(W\otimes 1_N)^\top\|_F/\|H\|_F$ (plotted in the middle), and the error in \ref{['eq.snc3']} by $\|b-\frac{1}{C}1_C\|_2$ (plotted on the right). Apparently, the limit point of gradient descent approaches strong neural collapse as the initialization approaches the origin.
  • Figure 2: Gradient descent maintains small distance from the invariant subspace $S$. Run gradient decent to minimize $R_e(H,W,b)$ for $C=N=3$ and $p=15$, initializing at a random choice of $H_0$ and $W_0$ with $\|H_0\|_F=\|W_0\|_F=\varepsilon$ and $b_0=0$. At each iteration, quantify the relative distance from $S$ by $\|Z-\Pi_SZ\|_E/\|Z\|_E$, where $\|(H,W,b)\|_E^2:=\|H\|_F^2+\|W\|_F^2+\|b\|_2^2$.

Theorems & Definitions (8)

  • Lemma 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Lemma 4
  • proof