Kernel PCA for Out-of-Distribution Detection

Kun Fang; Qinghua Tao; Kexin Lv; Mingzhen He; Xiaolin Huang; Jie Yang

Kernel PCA for Out-of-Distribution Detection

Kun Fang, Qinghua Tao, Kexin Lv, Mingzhen He, Xiaolin Huang, Jie Yang

TL;DR

This work tackles out-of-distribution (OoD) detection by applying Kernel PCA (KPCA) to penultimate DNN features, addressing PCA's linearity limitation with two task-specific, explicit mappings. The Cosine Kernel KPCA (CoP) uses a cosine normalization mapping, while the Cosine-Gaussian KPCA (CoRP) adds a Gaussian component via Random Fourier Features, yielding reconstruction errors that separate InD and OoD data efficiently. The approach achieves state-of-the-art OoD detection on CIFAR-10 and ImageNet-1K across multiple datasets with favorable inference cost ($O(1)$ for CoP and $O(M)$ for CoRP) compared to $O(N_{tr})$ for nearest-neighbor methods, and provides theoretical links between covariance-based and kernel KPCA. While manual kernel choices limit generality, the kernel perspective guides future learning of kernels (e.g., deep kernel learning) to further enhance robustness and scalability in real-world deployment.

Abstract

Out-of-Distribution (OoD) detection is vital for the reliability of Deep Neural Networks (DNNs). Existing works have shown the insufficiency of Principal Component Analysis (PCA) straightforwardly applied on the features of DNNs in detecting OoD data from In-Distribution (InD) data. The failure of PCA suggests that the network features residing in OoD and InD are not well separated by simply proceeding in a linear subspace, which instead can be resolved through proper non-linear mappings. In this work, we leverage the framework of Kernel PCA (KPCA) for OoD detection, and seek suitable non-linear kernels that advocate the separability between InD and OoD data in the subspace spanned by the principal components. Besides, explicit feature mappings induced from the devoted task-specific kernels are adopted so that the KPCA reconstruction error for new test samples can be efficiently obtained with large-scale data. Extensive theoretical and empirical results on multiple OoD data sets and network structures verify the superiority of our KPCA detector in efficiency and efficacy with state-of-the-art detection performance.

Kernel PCA for Out-of-Distribution Detection

TL;DR

for CoP and

for CoRP) compared to

for nearest-neighbor methods, and provides theoretical links between covariance-based and kernel KPCA. While manual kernel choices limit generality, the kernel perspective guides future learning of kernels (e.g., deep kernel learning) to further enhance robustness and scalability in real-world deployment.

Abstract

Paper Structure (26 sections, 2 theorems, 16 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 26 sections, 2 theorems, 16 equations, 7 figures, 6 tables, 1 algorithm.

Introduction
Related work
Background
PCA for OoD detection
Random Fourier features
Methodology
Cosine kernel
Cosine-Gaussian kernel
Computation complexity
Experiments on OoD detection
Datasets
Metrics
Comparisons with nearest neighbor searching
Comparisons with regularized reconstruction errors
Analytical discussions with KPCA via kernel functions
...and 11 more sections

Key Result

Proposition 1

The KPCA reconstruction error $e^\Phi(\boldsymbol{\hat{z}})$ can be represented as the norm of features projected in the residual subspace, i.e., the $\boldsymbol{U}^\Phi_p$-subspace with $\boldsymbol{U}^\Phi=[\boldsymbol{U}^\Phi_q,\boldsymbol{U}^\Phi_p]$:

Figures (7)

Figure 1: The t-SNE van2008visualizing visualization on the original features $\boldsymbol{z}$ (left) and the mapped features $\Phi(\boldsymbol{z})$ (right). Our KPCA detector alleviates the linearly inseparability between InD and OoD features in the original $\boldsymbol{z}$-space via an explicit feature mapping $\Phi$, and thus substantially improves the OoD detection performance, illustrated by the much more distinguishable reconstruction errors.
Figure 2: Comparisons on the average detection FPR values between CoP/CoRP and their kernel function implementations in the CIFAR10 benchmark. In experiments, 5,000 images of the CIFAR10 training set and 1,000 images of the CIFAR10 test set and OoD data sets are randomly selected.
Figure 3: A density histogram of the imbalanced norms of InD and OoD features. InD: CIFAR10 and ImageNet-1K. OoD: LSUN and places365, SUN and Textures.
Figure 4: T-SNE visualization of the original features (left), mapped features w.r.t a Gaussian kernel (middle) and mapped features w.r.t a cosine kernel (right).
Figure 5: A sensitivity analysis on the explained variance ratio of CoP (top) and CoRP (bottom). The average FPR and AUROC values of OoD data sets in CIFAR10 and ImageNet-1K benchmarks are reported. The Gaussian kernel width $\gamma$ and the dimension $M$ of RFFs in CoRP are fixed.
...and 2 more figures

Theorems & Definitions (3)

Proposition 1
Proposition 2
proof

Kernel PCA for Out-of-Distribution Detection

TL;DR

Abstract

Kernel PCA for Out-of-Distribution Detection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (3)