Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu; Yao Qin

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu, Yao Qin

TL;DR

Out-of-distribution detection remains challenging due to generalization gaps across datasets and architectures, and real-time deployment requires low latency. The authors introduce Neural Collapse Inspired OOD Detector (NCI), which leverages two geometric cues—ID features centering toward the predicted class weight vectors and the simplex Equiangular Tight Frame expansion—together with an origin-distance filter to separate ID from OOD samples, achieving $O(P)$ complexity. By formalizing Neural Collapse (NC1–NC4) and demonstrating its practical trend in standard models, they show that these cues enable effective OOD detection across CIFAR-10/100 and ImageNet with latency comparable to MSP and with reduced generalization discrepancies. The work unifies clustering- and energy-based perspectives under a Neural Collapse lens and provides a scalable, code-releaseable detector that adapts across CNNs and vision transformers.

Abstract

Out-of-Distribution (OOD) detection is critical for safe deployment; however, existing detectors often struggle to generalize across datasets of varying scales and model architectures, and some can incur high computational costs in real-world applications. Inspired by the phenomenon of Neural Collapse, we propose a versatile and efficient OOD detection method. Specifically, we re-characterize prior observations that in-distribution (ID) samples form clusters, demonstrating that, with appropriate centering, these clusters align closely with model weight vectors. Additionally, we reveal that ID features tend to expand into a simplex Equiangular Tight Frame, explaining the common observation that ID features are situated farther from the origin than OOD features. Incorporating both insights from Neural Collapse, our OOD detector leverages feature proximity to weight vectors and complements this approach by using feature norms to effectively filter out OOD samples. Extensive experiments on off-the-shelf models demonstrate the robustness of our OOD detector across diverse scenarios, mitigating generalization discrepancies and enhancing overall performance, with inference latency comparable to that of the basic softmax-confidence detector. Code is available here: https://github.com/litianliu/NCI-OOD.

Detecting Out-of-Distribution Through the Lens of Neural Collapse

TL;DR

complexity. By formalizing Neural Collapse (NC1–NC4) and demonstrating its practical trend in standard models, they show that these cues enable effective OOD detection across CIFAR-10/100 and ImageNet with latency comparable to MSP and with reduced generalization discrepancies. The work unifies clustering- and energy-based perspectives under a Neural Collapse lens and provides a scalable, code-releaseable detector that adapts across CNNs and vision transformers.

Abstract

Paper Structure (24 sections, 1 theorem, 19 equations, 7 figures, 9 tables)

This paper contains 24 sections, 1 theorem, 19 equations, 7 figures, 9 tables.

Introduction
Problem Statement
OOD Detection through the Lens of Neural Collapse
Neural Collapse: Convergence Landscape
Trend of Neural Collapse & Geometric Structure of the ID Clusters
Out-of-Distribution Detection
Experiments
Mitigating Discrepencies across ID Datasets
Mitigating Discrepancies across Architectures
Ablation on the Filtering Effect
Related Work
OOD Detection
Neural Collapse
Conclusion
Implementation Details
...and 9 more sections

Key Result

Theorem 3.1

(NC1) and (NC3) imply that for any sample $i$ and its predicted class $c$, we have where $\lambda = \frac{\| \bm{\mu}_{c} - \bm{\mu}_G\|_2}{\|\bm{w}_c\|_2}$ in the Terminal Phase of Training.

Figures (7)

Figure 1: Centered ID samples tend to cluster near the predicted class weight vectors, which are the last-layer weights of the corresponding class, as indicated by higher average cosine similarity (Equation \ref{['eq:avgCos']}) than OOD. This observation, inspired by the trend of Neural Collapse, emerges early in the training of this CIFAR-10 ResNet-18 classifier, with OOD set SVHN.
Figure 2: Framework Illustration.Left: On the penultimate layer, the centered ID clusters reside near their predicted class weight vectors (marked by stars) while OOD samples reside separated, as shown by UMAP. Middle: ID and OOD samples are separated by $\mathtt{pScore}$ (Equation \ref{['eq:prox']}), which measures feature proximity to weight vectors. Also, ID samples tend to be further from the origin, illustrated with $\mathtt{L1}$ norms. Right: ID samples cluster near a simplex Equiangular Tight Framework, illustrated with black arrows denoting weight vectors. We detect OOD by thresholding on $\mathtt{pScore}$, selecting blue-shaded hypercones centered at weight vectors, with OOD samples outside these areas. We also filter OOD samples characterized by smaller feature norms. Left & Middle present a practical off-the-shelf CIFAR-10 ResNet-18 classifier with OOD set SVHN. Right depicts our scheme on a three-class classifier with 2D penultimate space.
Figure 3: (ref. Figure 2 in papyan2020prevalence) Train class means become equinorm. In each array cell, the vertical axis shows the coefficient of variation of the centered class-mean norms as well as the network classifiers norms. In particular, the blue lines show $\text{Std}_c(\|\bm{\mu}_c - \bm{\mu}_G \|_2)/\text{Avg}(\|\bm{\mu} - \bm{\mu}_G\|_2)$ where $\{\bm{\mu}_c \}$ are the class means of the last-layer activations of the training data and $\bm{\mu}_G$ is the corresponding train global mean; the orange lines show$\text{Std}_c(\|\bm{w}_c\|_2)/\text{Avg}(\|\bm{w}_c\|_2)$ where $\{\bm{w}_c \}$ is the last-layer classifier of the $c$ th class. As training progresses, the coefficients of variation of both class means and classifiers decrease.
Figure 4: (ref. Figure 3 in papyan2020prevalence) Classifiers and train class means approach equiangularity. In each array cell, the vertical axis shows the SD of the cosines between pairs of centered class means and classifiers across all distinct pairs of classes $c$ and $c'$. Mathematically, denote $\cos_\mu(c, c') = <\bm{\mu}_c - \bm{\mu}_G, \bm{\mu}_c' - \bm{\mu}_G> / \|\bm{\mu}_c - \bm{\mu}_G\|_2 \|\bm{\mu}_c' - \bm{\mu}_G\|_2$ and $\cos_w(c, c') = <\bm{w}_c, \bm{w}_c'> / \|\bm{w}_c\|_2 \|\bm{w}_c'\|_2$, where $\{\bm{w}_c\}_{c = 1}^C, \{\bm{\mu}_c\}_{c = 1}^C$, and $\bm{\mu}_G$ are as in Figure \ref{['fig:nc_fig02']}. We measure $\text{Std}_{c, c'}(\cos_\mu(c, c'))$ (orange) and $\text{Std}_{c, c'}(\cos_w(c, c'))$. As training progresses, the SDs of the cosines approach zero, indicating equiangularity.
Figure 5: (ref. Figure 4 in papyan2020prevalence) Classifiers and train class means approach maximal-angle equiangularity. We plot in the vertical axis of each cell the quantities $\text{Avg}_{c, c'} |\cos_\mu(c, c') + 1/(C -1)|$ (blue) and $\text{Avg}_{c, c'} |\cos_w (c, c') + 1/(C -1)|$ (orange), where $\cos_\mu (c, c')$ and $\cos_w (c, c')$ are as in Figure \ref{['fig:nc_fig03']}. As training progresses, the convergence of these values to zero implies that all cosines converge to $-1/(C-1)$. This corresponds to the maximum separation possible for globally centered, equiangular vectors.
...and 2 more figures

Theorems & Definitions (2)

Theorem 3.1
proof

Detecting Out-of-Distribution Through the Lens of Neural Collapse

TL;DR

Abstract

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (2)