Table of Contents
Fetching ...

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu, Yao Qin

TL;DR

Out-of-distribution detection remains challenging due to generalization gaps across datasets and architectures, and real-time deployment requires low latency. The authors introduce Neural Collapse Inspired OOD Detector (NCI), which leverages two geometric cues—ID features centering toward the predicted class weight vectors and the simplex Equiangular Tight Frame expansion—together with an origin-distance filter to separate ID from OOD samples, achieving $O(P)$ complexity. By formalizing Neural Collapse (NC1–NC4) and demonstrating its practical trend in standard models, they show that these cues enable effective OOD detection across CIFAR-10/100 and ImageNet with latency comparable to MSP and with reduced generalization discrepancies. The work unifies clustering- and energy-based perspectives under a Neural Collapse lens and provides a scalable, code-releaseable detector that adapts across CNNs and vision transformers.

Abstract

Out-of-Distribution (OOD) detection is critical for safe deployment; however, existing detectors often struggle to generalize across datasets of varying scales and model architectures, and some can incur high computational costs in real-world applications. Inspired by the phenomenon of Neural Collapse, we propose a versatile and efficient OOD detection method. Specifically, we re-characterize prior observations that in-distribution (ID) samples form clusters, demonstrating that, with appropriate centering, these clusters align closely with model weight vectors. Additionally, we reveal that ID features tend to expand into a simplex Equiangular Tight Frame, explaining the common observation that ID features are situated farther from the origin than OOD features. Incorporating both insights from Neural Collapse, our OOD detector leverages feature proximity to weight vectors and complements this approach by using feature norms to effectively filter out OOD samples. Extensive experiments on off-the-shelf models demonstrate the robustness of our OOD detector across diverse scenarios, mitigating generalization discrepancies and enhancing overall performance, with inference latency comparable to that of the basic softmax-confidence detector. Code is available here: https://github.com/litianliu/NCI-OOD.

Detecting Out-of-Distribution Through the Lens of Neural Collapse

TL;DR

Out-of-distribution detection remains challenging due to generalization gaps across datasets and architectures, and real-time deployment requires low latency. The authors introduce Neural Collapse Inspired OOD Detector (NCI), which leverages two geometric cues—ID features centering toward the predicted class weight vectors and the simplex Equiangular Tight Frame expansion—together with an origin-distance filter to separate ID from OOD samples, achieving complexity. By formalizing Neural Collapse (NC1–NC4) and demonstrating its practical trend in standard models, they show that these cues enable effective OOD detection across CIFAR-10/100 and ImageNet with latency comparable to MSP and with reduced generalization discrepancies. The work unifies clustering- and energy-based perspectives under a Neural Collapse lens and provides a scalable, code-releaseable detector that adapts across CNNs and vision transformers.

Abstract

Out-of-Distribution (OOD) detection is critical for safe deployment; however, existing detectors often struggle to generalize across datasets of varying scales and model architectures, and some can incur high computational costs in real-world applications. Inspired by the phenomenon of Neural Collapse, we propose a versatile and efficient OOD detection method. Specifically, we re-characterize prior observations that in-distribution (ID) samples form clusters, demonstrating that, with appropriate centering, these clusters align closely with model weight vectors. Additionally, we reveal that ID features tend to expand into a simplex Equiangular Tight Frame, explaining the common observation that ID features are situated farther from the origin than OOD features. Incorporating both insights from Neural Collapse, our OOD detector leverages feature proximity to weight vectors and complements this approach by using feature norms to effectively filter out OOD samples. Extensive experiments on off-the-shelf models demonstrate the robustness of our OOD detector across diverse scenarios, mitigating generalization discrepancies and enhancing overall performance, with inference latency comparable to that of the basic softmax-confidence detector. Code is available here: https://github.com/litianliu/NCI-OOD.
Paper Structure (24 sections, 1 theorem, 19 equations, 7 figures, 9 tables)

This paper contains 24 sections, 1 theorem, 19 equations, 7 figures, 9 tables.

Key Result

Theorem 3.1

(NC1) and (NC3) imply that for any sample $i$ and its predicted class $c$, we have where $\lambda = \frac{\| \bm{\mu}_{c} - \bm{\mu}_G\|_2}{\|\bm{w}_c\|_2}$ in the Terminal Phase of Training.

Figures (7)

  • Figure 1: Centered ID samples tend to cluster near the predicted class weight vectors, which are the last-layer weights of the corresponding class, as indicated by higher average cosine similarity (Equation \ref{['eq:avgCos']}) than OOD. This observation, inspired by the trend of Neural Collapse, emerges early in the training of this CIFAR-10 ResNet-18 classifier, with OOD set SVHN.
  • Figure 2: Framework Illustration.Left: On the penultimate layer, the centered ID clusters reside near their predicted class weight vectors (marked by stars) while OOD samples reside separated, as shown by UMAP. Middle: ID and OOD samples are separated by $\mathtt{pScore}$ (Equation \ref{['eq:prox']}), which measures feature proximity to weight vectors. Also, ID samples tend to be further from the origin, illustrated with $\mathtt{L1}$ norms. Right: ID samples cluster near a simplex Equiangular Tight Framework, illustrated with black arrows denoting weight vectors. We detect OOD by thresholding on $\mathtt{pScore}$, selecting blue-shaded hypercones centered at weight vectors, with OOD samples outside these areas. We also filter OOD samples characterized by smaller feature norms. Left & Middle present a practical off-the-shelf CIFAR-10 ResNet-18 classifier with OOD set SVHN. Right depicts our scheme on a three-class classifier with 2D penultimate space.
  • Figure 3: (ref. Figure 2 in papyan2020prevalence) Train class means become equinorm. In each array cell, the vertical axis shows the coefficient of variation of the centered class-mean norms as well as the network classifiers norms. In particular, the blue lines show $\text{Std}_c(\|\bm{\mu}_c - \bm{\mu}_G \|_2)/\text{Avg}(\|\bm{\mu} - \bm{\mu}_G\|_2)$ where $\{\bm{\mu}_c \}$ are the class means of the last-layer activations of the training data and $\bm{\mu}_G$ is the corresponding train global mean; the orange lines show$\text{Std}_c(\|\bm{w}_c\|_2)/\text{Avg}(\|\bm{w}_c\|_2)$ where $\{\bm{w}_c \}$ is the last-layer classifier of the $c$ th class. As training progresses, the coefficients of variation of both class means and classifiers decrease.
  • Figure 4: (ref. Figure 3 in papyan2020prevalence) Classifiers and train class means approach equiangularity. In each array cell, the vertical axis shows the SD of the cosines between pairs of centered class means and classifiers across all distinct pairs of classes $c$ and $c'$. Mathematically, denote $\cos_\mu(c, c') = <\bm{\mu}_c - \bm{\mu}_G, \bm{\mu}_c' - \bm{\mu}_G> / \|\bm{\mu}_c - \bm{\mu}_G\|_2 \|\bm{\mu}_c' - \bm{\mu}_G\|_2$ and $\cos_w(c, c') = <\bm{w}_c, \bm{w}_c'> / \|\bm{w}_c\|_2 \|\bm{w}_c'\|_2$, where $\{\bm{w}_c\}_{c = 1}^C, \{\bm{\mu}_c\}_{c = 1}^C$, and $\bm{\mu}_G$ are as in Figure \ref{['fig:nc_fig02']}. We measure $\text{Std}_{c, c'}(\cos_\mu(c, c'))$ (orange) and $\text{Std}_{c, c'}(\cos_w(c, c'))$. As training progresses, the SDs of the cosines approach zero, indicating equiangularity.
  • Figure 5: (ref. Figure 4 in papyan2020prevalence) Classifiers and train class means approach maximal-angle equiangularity. We plot in the vertical axis of each cell the quantities $\text{Avg}_{c, c'} |\cos_\mu(c, c') + 1/(C -1)|$ (blue) and $\text{Avg}_{c, c'} |\cos_w (c, c') + 1/(C -1)|$ (orange), where $\cos_\mu (c, c')$ and $\cos_w (c, c')$ are as in Figure \ref{['fig:nc_fig03']}. As training progresses, the convergence of these values to zero implies that all cosines converge to $-1/(C-1)$. This corresponds to the maximum separation possible for globally centered, equiangular vectors.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • proof