Neural collapse in the orthoplex regime

James Alcala; Rayna Andreeva; Vladimir A. Kobzar; Dustin G. Mixon; Sanghoon Na; Shashank Sule; Yangxinyu Xie

Neural collapse in the orthoplex regime

James Alcala, Rayna Andreeva, Vladimir A. Kobzar, Dustin G. Mixon, Sanghoon Na, Shashank Sule, Yangxinyu Xie

Abstract

When training a neural network for classification, the feature vectors of the training set are known to collapse to the vertices of a regular simplex, provided the dimension $d$ of the feature space and the number $n$ of classes satisfies $n\leq d+1$. This phenomenon is known as neural collapse. For other applications like language models, one instead takes $n\gg d$. Here, the neural collapse phenomenon still occurs, but with different emergent geometric figures. We characterize these geometric figures in the orthoplex regime where $d+2\leq n\leq 2d$. The techniques in our analysis primarily involve Radon's theorem and convexity.

Neural collapse in the orthoplex regime

Abstract

When training a neural network for classification, the feature vectors of the training set are known to collapse to the vertices of a regular simplex, provided the dimension

of the feature space and the number

of classes satisfies

. This phenomenon is known as neural collapse. For other applications like language models, one instead takes

. Here, the neural collapse phenomenon still occurs, but with different emergent geometric figures. We characterize these geometric figures in the orthoplex regime where

. The techniques in our analysis primarily involve Radon's theorem and convexity.

Paper Structure (5 sections, 13 theorems, 58 equations, 3 figures)

This paper contains 5 sections, 13 theorems, 58 equations, 3 figures.

Introduction
Softmax codes in the orthoplex regime
Self duality from lack of rattlers
Not all softmax codes are created equal
Discussion

Key Result

Proposition 1

Fix $d\geq2$ and $n\geq d+2$. For every configuration $X=\{x_i\}_{i\in[n]}$ in $S^{d-1}$, it holds that Furthermore, equality is achievable if and only if $n\leq 2d$.

Figures (3)

Figure 1: Schematic of our main results on neural collapse in the orthoplex regime. Theorems \ref{['thm.softmax codes in orthoplex regime']}, \ref{['thm.selfduality']}, and \ref{['thm.low high entropy']} appear in Sections \ref{['sec.softmax orthoplex']}, \ref{['sec.self duality']}, and \ref{['sec.nonequal']}, respectively.
Figure 2: Smallest examples of low- and high-entropy softmax codes.
Figure 3: Fix $n=10$. The first plot reports when $f_{n,\tau}$ is concave or convex. The next three plots report the best choice of $(d_1,\ldots,d_l)$ as a function of temperature $\tau$. Here, we take $d$ to be $6$, $7$, and $8$, respectively. We omit the $d=5$ case since the corresponding softmax code is unique up to rotation.

Theorems & Definitions (25)

Proposition 1: Rankin's orthoplex bound; see Theorem 1 in Rankin:55, cf. ORourke:mo
Theorem 2
Lemma 3
proof
Lemma 4
proof
proof : Proof of Theorem \ref{['thm.softmax codes in orthoplex regime']}
Theorem 5
Definition 6
Proposition 7: Theorem 3.7 in JiangEtal:24
...and 15 more

Neural collapse in the orthoplex regime

Abstract

Neural collapse in the orthoplex regime

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (25)