Geometric Analysis of Unconstrained Feature Models with $d=K$

Yi Shen; Shao Gu

Geometric Analysis of Unconstrained Feature Models with $d=K$

Yi Shen, Shao Gu

TL;DR

For $d=K$, the paper analyzes unconstrained feature models under $\mathcal{L}_{CE}$ and $\mathcal{L}_{MSE}$ losses and proves that both have no spurious local minima and are strict saddle functions. It characterizes critical points via a rank constraint $\operatorname{rank}(\bm{W})=\operatorname{rank}(\bm{H})\le K-1$ and a relation with $\nabla g(\bm{R})$, showing that all non-minimizers have a negative curvature direction. At global minima, neural-collapse properties emerge, notably that $W^{\star\top}$ forms a $K$-Simplex ETF (up to scale/rotation) and class means are centered. These results imply that setting the feature dimension to $K$ yields memory and computation savings while preserving convergence to neural-collapse-compatible optima, since gradient methods can escape strict saddles to reach global minimizers.

Abstract

Recently, interesting empirical phenomena known as Neural Collapse have been observed during the final phase of training deep neural networks for classification tasks. We examine this issue when the feature dimension d is equal to the number of classes K. We demonstrate that two popular unconstrained feature models are strict saddle functions, with every critical point being either a global minimum or a strict saddle point that can be exited using negative curvatures. The primary findings conclusively confirm the conjecture on the unconstrained feature models in previous articles.

Geometric Analysis of Unconstrained Feature Models with $d=K$

TL;DR

For

, the paper analyzes unconstrained feature models under

and

losses and proves that both have no spurious local minima and are strict saddle functions. It characterizes critical points via a rank constraint

and a relation with

, showing that all non-minimizers have a negative curvature direction. At global minima, neural-collapse properties emerge, notably that

forms a

-Simplex ETF (up to scale/rotation) and class means are centered. These results imply that setting the feature dimension to

yields memory and computation savings while preserving convergence to neural-collapse-compatible optima, since gradient methods can escape strict saddles to reach global minimizers.

Abstract

Paper Structure (4 sections, 6 theorems, 103 equations)

This paper contains 4 sections, 6 theorems, 103 equations.

Introduction
Notation
Proof of Theorem \ref{['thm2']}
Proof of Theorem \ref{['thm3']}

Key Result

Theorem 1.1

Assume that the feature dimension $d$ is equal to the number of classes $K$. The function $f^{C}(\bm{W},\bm{H},\bm{b})$ in maince is a strict saddle function with no spurious local minimum, in the sense that

Theorems & Definitions (12)

Theorem 1.1
Theorem 1.2
Lemma 3.1
proof
proof : Proof of Theorem \ref{['thm2']}
Proposition 4.1
proof
Proposition 4.2
proof
Proposition 4.3
...and 2 more

Geometric Analysis of Unconstrained Feature Models with $d=K$

TL;DR

Abstract

Geometric Analysis of Unconstrained Feature Models with $d=K$

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (12)