Table of Contents
Fetching ...

Geometric Analysis of Unconstrained Feature Models with $d=K$

Yi Shen, Shao Gu

TL;DR

For $d=K$, the paper analyzes unconstrained feature models under $\mathcal{L}_{CE}$ and $\mathcal{L}_{MSE}$ losses and proves that both have no spurious local minima and are strict saddle functions. It characterizes critical points via a rank constraint $\operatorname{rank}(\bm{W})=\operatorname{rank}(\bm{H})\le K-1$ and a relation with $\nabla g(\bm{R})$, showing that all non-minimizers have a negative curvature direction. At global minima, neural-collapse properties emerge, notably that $W^{\star\top}$ forms a $K$-Simplex ETF (up to scale/rotation) and class means are centered. These results imply that setting the feature dimension to $K$ yields memory and computation savings while preserving convergence to neural-collapse-compatible optima, since gradient methods can escape strict saddles to reach global minimizers.

Abstract

Recently, interesting empirical phenomena known as Neural Collapse have been observed during the final phase of training deep neural networks for classification tasks. We examine this issue when the feature dimension d is equal to the number of classes K. We demonstrate that two popular unconstrained feature models are strict saddle functions, with every critical point being either a global minimum or a strict saddle point that can be exited using negative curvatures. The primary findings conclusively confirm the conjecture on the unconstrained feature models in previous articles.

Geometric Analysis of Unconstrained Feature Models with $d=K$

TL;DR

For , the paper analyzes unconstrained feature models under and losses and proves that both have no spurious local minima and are strict saddle functions. It characterizes critical points via a rank constraint and a relation with , showing that all non-minimizers have a negative curvature direction. At global minima, neural-collapse properties emerge, notably that forms a -Simplex ETF (up to scale/rotation) and class means are centered. These results imply that setting the feature dimension to yields memory and computation savings while preserving convergence to neural-collapse-compatible optima, since gradient methods can escape strict saddles to reach global minimizers.

Abstract

Recently, interesting empirical phenomena known as Neural Collapse have been observed during the final phase of training deep neural networks for classification tasks. We examine this issue when the feature dimension d is equal to the number of classes K. We demonstrate that two popular unconstrained feature models are strict saddle functions, with every critical point being either a global minimum or a strict saddle point that can be exited using negative curvatures. The primary findings conclusively confirm the conjecture on the unconstrained feature models in previous articles.
Paper Structure (4 sections, 6 theorems, 103 equations)

This paper contains 4 sections, 6 theorems, 103 equations.

Key Result

Theorem 1.1

Assume that the feature dimension $d$ is equal to the number of classes $K$. The function $f^{C}(\bm{W},\bm{H},\bm{b})$ in maince is a strict saddle function with no spurious local minimum, in the sense that

Theorems & Definitions (12)

  • Theorem 1.1
  • Theorem 1.2
  • Lemma 3.1
  • proof
  • proof : Proof of Theorem \ref{['thm2']}
  • Proposition 4.1
  • proof
  • Proposition 4.2
  • proof
  • Proposition 4.3
  • ...and 2 more