Table of Contents
Fetching ...

All-around Neural Collapse for Imbalanced Classification

Enhao Zhang, Chaohua Li, Chuanxing Geng, Songcan Chen

TL;DR

The paper addresses how Neural Collapse (NC) — the geometric alignment of activations, class means, and classifier weights observed on balanced data — deteriorates under imbalanced classification due to minority collapse. It introduces All-around Neural Collapse (AllNC), an end-to-end framework combining Hybrid Contrastive Loss (HyCon) for NC1, Peer-to-Peer loss (P2P) for NC2/NC3, and Generalized Bilateral-Branch Network (GBBN) for progressive feature-classifier decoupling, aiming to restore NC across activations, means, and weights. The authors reveal complete minority collapses and the failure of classifier-only NC recovery to preserve self-duality, and demonstrate that AllNC consistently achieves state-of-the-art results on balanced and long-tailed datasets (CIFAR-10/100-LT, ImageNet-LT, iNaturalist2018), with ablations validating the contribution of each component. The approach improves minority-class performance and generalization by preserving NC structure, suggesting practical impact for robust learning under real-world data imbalance and potential extensions to detection and segmentation tasks.

Abstract

Neural Collapse (NC) presents an elegant geometric structure that enables individual activations (features), class means and classifier (weights) vectors to reach \textit{optimal} inter-class separability during the terminal phase of training on a \textit{balanced} dataset. Once shifted to imbalanced classification, such an optimal structure of NC can be readily destroyed by the notorious \textit{minority collapse}, where the classifier vectors corresponding to the minority classes are squeezed. In response, existing works endeavor to recover NC typically by optimizing classifiers. However, we discover that this squeezing phenomenon is not only confined to classifier vectors but also occurs with class means. Consequently, reconstructing NC solely at the classifier aspect may be futile, as the feature means remain compressed, leading to the violation of inherent \textit{self-duality} in NC (\textit{i.e.}, class means and classifier vectors converge mutually) and incidentally, resulting in an unsatisfactory collapse of individual activations towards the corresponding class means. To shake off these dilemmas, we present a unified \textbf{All}-around \textbf{N}eural \textbf{C}ollapse framework (AllNC), aiming to comprehensively restore NC across multiple aspects including individual activations, class means and classifier vectors. We thoroughly analyze its effectiveness and verify on multiple benchmark datasets that it achieves state-of-the-art in both balanced and imbalanced settings.

All-around Neural Collapse for Imbalanced Classification

TL;DR

The paper addresses how Neural Collapse (NC) — the geometric alignment of activations, class means, and classifier weights observed on balanced data — deteriorates under imbalanced classification due to minority collapse. It introduces All-around Neural Collapse (AllNC), an end-to-end framework combining Hybrid Contrastive Loss (HyCon) for NC1, Peer-to-Peer loss (P2P) for NC2/NC3, and Generalized Bilateral-Branch Network (GBBN) for progressive feature-classifier decoupling, aiming to restore NC across activations, means, and weights. The authors reveal complete minority collapses and the failure of classifier-only NC recovery to preserve self-duality, and demonstrate that AllNC consistently achieves state-of-the-art results on balanced and long-tailed datasets (CIFAR-10/100-LT, ImageNet-LT, iNaturalist2018), with ablations validating the contribution of each component. The approach improves minority-class performance and generalization by preserving NC structure, suggesting practical impact for robust learning under real-world data imbalance and potential extensions to detection and segmentation tasks.

Abstract

Neural Collapse (NC) presents an elegant geometric structure that enables individual activations (features), class means and classifier (weights) vectors to reach \textit{optimal} inter-class separability during the terminal phase of training on a \textit{balanced} dataset. Once shifted to imbalanced classification, such an optimal structure of NC can be readily destroyed by the notorious \textit{minority collapse}, where the classifier vectors corresponding to the minority classes are squeezed. In response, existing works endeavor to recover NC typically by optimizing classifiers. However, we discover that this squeezing phenomenon is not only confined to classifier vectors but also occurs with class means. Consequently, reconstructing NC solely at the classifier aspect may be futile, as the feature means remain compressed, leading to the violation of inherent \textit{self-duality} in NC (\textit{i.e.}, class means and classifier vectors converge mutually) and incidentally, resulting in an unsatisfactory collapse of individual activations towards the corresponding class means. To shake off these dilemmas, we present a unified \textbf{All}-around \textbf{N}eural \textbf{C}ollapse framework (AllNC), aiming to comprehensively restore NC across multiple aspects including individual activations, class means and classifier vectors. We thoroughly analyze its effectiveness and verify on multiple benchmark datasets that it achieves state-of-the-art in both balanced and imbalanced settings.
Paper Structure (35 sections, 12 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 35 sections, 12 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Geometric illustration of class means and classifier vectors. (a) Plain cross-entropy loss. (b) ETF classifier. (c) Embedded feature constraints. (d) Our method. In particular, different colors indicate various classes, where the size of the class mean legend is proportional to the corresponding intra-class sample number.
  • Figure 2: Inter-Class Pairwise Angles (ICPAs) corresponding to the centered class means and classifier vectors in CIFAR10 and CIFAR10-LT, where a grid represents an interclass angle, with the darker-blue grid means a more serious deviation from the optimal ICPA(i.e., $96.4^\circ$).
  • Figure 3: The Std of the cosines between all pairs of different classes of the class mean/classifier vector on CIFAR10/CIFAR10-LT. The blue lines represent the Std obtained using CE loss on balanced data, while the red and green lines represent the Std calculated using the CE loss and our proposed AllNC on a long-tailed dataset with an imbalance rate of $\beta$, respectively.
  • Figure 4: Self-Duality Metrics obtained on CIFAR10/100 and CIFAR10/100-LT, where the red and green lines indicate the results obtained in the long-tailed datasets.
  • Figure 5: (a) The architecture diagram of AllNC, which utilizes a contrastive framework and two additional classification heads. In this design, the encoders, predictors, MLPs and classifiers between the two branches share the same weights, largely reducing the computational complexity during the inference phase. (b) Trend plots of parameters $\eta$ (solid lines) and $1-\eta$ (dashed lines) for the adjustable parameter $\gamma$.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Definition 1: Simplex ETF
  • Remark 1
  • Remark 2
  • Remark 3