Geometric Analysis of Unconstrained Feature Models with $d=K$
Yi Shen, Shao Gu
TL;DR
For $d=K$, the paper analyzes unconstrained feature models under $\mathcal{L}_{CE}$ and $\mathcal{L}_{MSE}$ losses and proves that both have no spurious local minima and are strict saddle functions. It characterizes critical points via a rank constraint $\operatorname{rank}(\bm{W})=\operatorname{rank}(\bm{H})\le K-1$ and a relation with $\nabla g(\bm{R})$, showing that all non-minimizers have a negative curvature direction. At global minima, neural-collapse properties emerge, notably that $W^{\star\top}$ forms a $K$-Simplex ETF (up to scale/rotation) and class means are centered. These results imply that setting the feature dimension to $K$ yields memory and computation savings while preserving convergence to neural-collapse-compatible optima, since gradient methods can escape strict saddles to reach global minimizers.
Abstract
Recently, interesting empirical phenomena known as Neural Collapse have been observed during the final phase of training deep neural networks for classification tasks. We examine this issue when the feature dimension d is equal to the number of classes K. We demonstrate that two popular unconstrained feature models are strict saddle functions, with every critical point being either a global minimum or a strict saddle point that can be exited using negative curvatures. The primary findings conclusively confirm the conjecture on the unconstrained feature models in previous articles.
