Table of Contents
Fetching ...

Neural collapse in the orthoplex regime

James Alcala, Rayna Andreeva, Vladimir A. Kobzar, Dustin G. Mixon, Sanghoon Na, Shashank Sule, Yangxinyu Xie

Abstract

When training a neural network for classification, the feature vectors of the training set are known to collapse to the vertices of a regular simplex, provided the dimension $d$ of the feature space and the number $n$ of classes satisfies $n\leq d+1$. This phenomenon is known as neural collapse. For other applications like language models, one instead takes $n\gg d$. Here, the neural collapse phenomenon still occurs, but with different emergent geometric figures. We characterize these geometric figures in the orthoplex regime where $d+2\leq n\leq 2d$. The techniques in our analysis primarily involve Radon's theorem and convexity.

Neural collapse in the orthoplex regime

Abstract

When training a neural network for classification, the feature vectors of the training set are known to collapse to the vertices of a regular simplex, provided the dimension of the feature space and the number of classes satisfies . This phenomenon is known as neural collapse. For other applications like language models, one instead takes . Here, the neural collapse phenomenon still occurs, but with different emergent geometric figures. We characterize these geometric figures in the orthoplex regime where . The techniques in our analysis primarily involve Radon's theorem and convexity.
Paper Structure (5 sections, 13 theorems, 58 equations, 3 figures)

This paper contains 5 sections, 13 theorems, 58 equations, 3 figures.

Key Result

Proposition 1

Fix $d\geq2$ and $n\geq d+2$. For every configuration $X=\{x_i\}_{i\in[n]}$ in $S^{d-1}$, it holds that Furthermore, equality is achievable if and only if $n\leq 2d$.

Figures (3)

  • Figure 1: Schematic of our main results on neural collapse in the orthoplex regime. Theorems \ref{['thm.softmax codes in orthoplex regime']}, \ref{['thm.selfduality']}, and \ref{['thm.low high entropy']} appear in Sections \ref{['sec.softmax orthoplex']}, \ref{['sec.self duality']}, and \ref{['sec.nonequal']}, respectively.
  • Figure 2: Smallest examples of low- and high-entropy softmax codes.
  • Figure 3: Fix $n=10$. The first plot reports when $f_{n,\tau}$ is concave or convex. The next three plots report the best choice of $(d_1,\ldots,d_l)$ as a function of temperature $\tau$. Here, we take $d$ to be $6$, $7$, and $8$, respectively. We omit the $d=5$ case since the corresponding softmax code is unique up to rotation.

Theorems & Definitions (25)

  • Proposition 1: Rankin's orthoplex bound; see Theorem 1 in Rankin:55, cf. ORourke:mo
  • Theorem 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • proof : Proof of Theorem \ref{['thm.softmax codes in orthoplex regime']}
  • Theorem 5
  • Definition 6
  • Proposition 7: Theorem 3.7 in JiangEtal:24
  • ...and 15 more