RankFeat&RankWeight: Rank-1 Feature/Weight Removal for Out-of-distribution Detection

Yue Song; Wei Wang; Nicu Sebe

RankFeat&RankWeight: Rank-1 Feature/Weight Removal for Out-of-distribution Detection

Yue Song, Wei Wang, Nicu Sebe

TL;DR

This work presents RankFeat and RankWeight as simple post hoc OOD detection techniques that remove rank-1 subspaces from high-level feature maps and from the last-layer weight matrices, respectively. RankFeat perturbs OOD predictions by subtracting the dominant singular component, yielding strong separation between ID and OOD data, while RankWeight provides a cheap, single-layer weight perturbation that can also boost other OOD detectors when used as a plugin. The authors provide extensive theoretical analyses, including upper-bound reductions and connections to random matrix theory and ReAct, along with comprehensive experiments across CNNs and vision transformers on ImageNet-1k and a large-scale Species dataset, achieving state-of-the-art FPR95 and AUROC. The method is broadly compatible with existing OOD approaches and offers practical benefits, including low computational overhead and easy integration, with public code available.

Abstract

The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely determined by it. This observation motivates us to propose \texttt{RankFeat}, a simple yet effective \emph{post hoc} approach for OOD detection by removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature. \texttt{RankFeat} achieves \emph{state-of-the-art} performance and reduces the average false positive rate (FPR95) by 17.90\% compared with the previous best method. The success of \texttt{RankFeat} motivates us to investigate whether a similar phenomenon would exist in the parameter matrices of neural networks. We thus propose \texttt{RankWeight} which removes the rank-1 weight from the parameter matrices of a single deep layer. Our \texttt{RankWeight}is also \emph{post hoc} and only requires computing the rank-1 matrix once. As a standalone approach, \texttt{RankWeight} has very competitive performance against other methods across various backbones. Moreover, \texttt{RankWeight} enjoys flexible compatibility with a wide range of OOD detection methods. The combination of \texttt{RankWeight} and \texttt{RankFeat} refreshes the new \emph{state-of-the-art} performance, achieving the FPR95 as low as 16.13\% on the ImageNet-1k benchmark. Extensive ablation studies and comprehensive theoretical analyses are presented to support the empirical results. Code is publicly available via \url{https://github.com/KingJamesSong/RankFeat}.

RankFeat&RankWeight: Rank-1 Feature/Weight Removal for Out-of-distribution Detection

TL;DR

Abstract

Paper Structure (49 sections, 3 theorems, 48 equations, 13 figures, 14 tables)

This paper contains 49 sections, 3 theorems, 48 equations, 13 figures, 14 tables.

Introduction
Related Work
Distribution Shifts
OOD Detection with Discriminative Models
OOD Detection with Generative Models
RankFeat&RankWeight: Rank-1 Feature/Weight Removal for OOD Detection
Preliminary: OOD Detection
RankFeat: Rank-1 Feature Removal
Acceleration by Power Iteration
Combination of Multi-scale Features
RankWeight: Rank-1 Weight Removal
Layers to Prune
No Incurred Time Costs
Combination with RankFeat
Compatibility with Other OOD Approaches
...and 34 more sections

Key Result

Proposition 1

The upper bound of RankFeat score is defined as $\texttt{RankFeat}(\mathbf{x}) <\frac{1}{HW} (\sum_{i=1}^{N} \mathbf{s}_{i} - \mathbf{s}_{1}) ||\mathbf{W}||_{\infty} + ||\mathbf{b}||_{\infty} + \log(Q)$ where $Q$ denotes the number of categories, and $\mathbf{W}$ and $\mathbf{b}$ are the weight and

Figures (13)

Figure 1: (a) The distribution of top-5 singular values for the ID and OOD features on ImageNet-1k and SUN. The OOD feature matrix tends to have a significantly larger dominant singular value. (b) After removing the rank-1 matrix composed by the dominant singular value and singular vectors, the class predictions of OOD data are severely perturbed, while those of ID data are moderately influenced. This observation indicates that the decisions of OOD data heavily depend on the dominant singular value and the corresponding singular vectors of the feature matrix. In light of this finding, we get motivated to propose RankFeat for OOD detection by removing the rank-1 matrix from the high-level feature. (c) After pruning the parameters of a single deep layer by removing the rank-1 matrix similarly, the class predictions of ID and OOD data exhibit distinct behaviors: most ID samples remain consistent class predictions, while OOD data is largely perturbed. This implies that the rank-1 parameter matrix of deep layers also plays a crucial role in making decisions about data samples. We thus propose RankWeight for post hoc OOD detection by removing the rank-1 matrix from the deep parameter matrix of only one layer. The observations also hold for other OOD datasets.
Figure 2: Visual illustration of our RankFeat and RankWeight: RankFeat removes the rank-1 feature matrix from the deep layers, while RankWeight perturbs the weight matrix of deep layers by removing the rank-1 subspace similarly.
Figure 3: The score distributions of Energyliu2020energy (top row), our RankFeat (middle row), and our proposed RankFeat+RankWeight (bottom row) on four OOD datasets. Our method can better separate the ID and OOD data.
Figure 4: Impact of RankFeat on the class predictions of ID and OOD data. The class predictions of OOD data are significantly more perturbed than those of ID data.
Figure 5: The exemplary eigenvalue distribution of ID/OOD feature and the fitted MP distribution. After the rank-1 matrix is removed, the lowest bin of OOD feature has a larger reduction and the middle bins gain some growth, making the ODD feature statistics closer to the MP distribution.
...and 8 more figures

Theorems & Definitions (5)

Proposition 1
proof
Theorem 1: Manchenko-Pastur Law marvcenko1967distributionsengupta1999distributions
Theorem 2
proof

RankFeat&RankWeight: Rank-1 Feature/Weight Removal for Out-of-distribution Detection

TL;DR

Abstract

RankFeat&RankWeight: Rank-1 Feature/Weight Removal for Out-of-distribution Detection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (5)