Table of Contents
Fetching ...

Kernel PCA for Out-of-Distribution Detection: Non-Linear Kernel Selections and Approximations

Kun Fang, Qinghua Tao, Mingzhen He, Kexin Lv, Runze Yang, Haibo Hu, Xiaolin Huang, Jie Yang, Longbin Cao

TL;DR

This work reframes OoD detection as learning a discriminative non-linear subspace of InD features via Kernel PCA (KPCA). It introduces a Cosine-Gaussian kernel to capture two key non-linear patterns relating InD and OoD distributions and provides two scalable kernel-approximation schemes, Random Fourier Features and a data-dependent Nyström method, to compute reconstruction errors efficiently. An energy-based Nyström sampling strategy further enhances subspace learning by focusing on boundary regions between InD and OoD. Empirically, the proposed KPCA framework achieves state-of-the-art OoD detection performance on ImageNet-1K with ResNet-50 and ViT, while maintaining low inference cost and memory; the work also offers detailed analysis of kernel choices, sampling schemes, and hyper-parameter sensitivity. Overall, the approach provides a practical, kernel-design-oriented pathway to robust OoD detection in large-scale deep learning systems.

Abstract

Out-of-Distribution (OoD) detection is vital for the reliability of deep neural networks, the key of which lies in effectively characterizing the disparities between OoD and In-Distribution (InD) data. In this work, such disparities are exploited through a fresh perspective of non-linear feature subspace. That is, a discriminative non-linear subspace is learned from InD features to capture representative patterns of InD, while informative patterns of OoD features cannot be well captured in such a subspace due to their different distribution. Grounded on this perspective, we exploit the deviations of InD and OoD features in such a non-linear subspace for effective OoD detection. To be specific, we leverage the framework of Kernel Principal Component Analysis (KPCA) to attain the discriminative non-linear subspace and deploy the reconstruction error on such subspace to distinguish InD and OoD data. Two challenges emerge: (i) the learning of an effective non-linear subspace, i.e., the selection of kernel function in KPCA, and (ii) the computation of the kernel matrix with large-scale InD data. For the former, we reveal two vital non-linear patterns that closely relate to the InD-OoD disparity, leading to the establishment of a Cosine-Gaussian kernel for constructing the subspace. For the latter, we introduce two techniques to approximate the Cosine-Gaussian kernel with significantly cheap computations. In particular, our approximation is further tailored by incorporating the InD data confidence, which is demonstrated to promote the learning of discriminative subspaces for OoD data. Our study presents new insights into the non-linear feature subspace for OoD detection and contributes practical explorations on the associated kernel design and efficient computations, yielding a KPCA detection method with distinctively improved efficacy and efficiency.

Kernel PCA for Out-of-Distribution Detection: Non-Linear Kernel Selections and Approximations

TL;DR

This work reframes OoD detection as learning a discriminative non-linear subspace of InD features via Kernel PCA (KPCA). It introduces a Cosine-Gaussian kernel to capture two key non-linear patterns relating InD and OoD distributions and provides two scalable kernel-approximation schemes, Random Fourier Features and a data-dependent Nyström method, to compute reconstruction errors efficiently. An energy-based Nyström sampling strategy further enhances subspace learning by focusing on boundary regions between InD and OoD. Empirically, the proposed KPCA framework achieves state-of-the-art OoD detection performance on ImageNet-1K with ResNet-50 and ViT, while maintaining low inference cost and memory; the work also offers detailed analysis of kernel choices, sampling schemes, and hyper-parameter sensitivity. Overall, the approach provides a practical, kernel-design-oriented pathway to robust OoD detection in large-scale deep learning systems.

Abstract

Out-of-Distribution (OoD) detection is vital for the reliability of deep neural networks, the key of which lies in effectively characterizing the disparities between OoD and In-Distribution (InD) data. In this work, such disparities are exploited through a fresh perspective of non-linear feature subspace. That is, a discriminative non-linear subspace is learned from InD features to capture representative patterns of InD, while informative patterns of OoD features cannot be well captured in such a subspace due to their different distribution. Grounded on this perspective, we exploit the deviations of InD and OoD features in such a non-linear subspace for effective OoD detection. To be specific, we leverage the framework of Kernel Principal Component Analysis (KPCA) to attain the discriminative non-linear subspace and deploy the reconstruction error on such subspace to distinguish InD and OoD data. Two challenges emerge: (i) the learning of an effective non-linear subspace, i.e., the selection of kernel function in KPCA, and (ii) the computation of the kernel matrix with large-scale InD data. For the former, we reveal two vital non-linear patterns that closely relate to the InD-OoD disparity, leading to the establishment of a Cosine-Gaussian kernel for constructing the subspace. For the latter, we introduce two techniques to approximate the Cosine-Gaussian kernel with significantly cheap computations. In particular, our approximation is further tailored by incorporating the InD data confidence, which is demonstrated to promote the learning of discriminative subspaces for OoD data. Our study presents new insights into the non-linear feature subspace for OoD detection and contributes practical explorations on the associated kernel design and efficient computations, yielding a KPCA detection method with distinctively improved efficacy and efficiency.

Paper Structure

This paper contains 34 sections, 1 theorem, 11 equations, 10 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

The exact KPCA reconstruction error $e^k(\boldsymbol{\hat{z}})$ can be calculated as: where ${\bf U}^{k}_p\in\mathbb{R}^{N_{\rm tr}\times p}$ includes the last $p$ columns of ${\bf U}^{k}$, i.e., those $p$ eigenvectors w.r.t. the smallest-$p$ eigenvalues in ${\bf \Lambda}^{k}$.

Figures (10)

  • Figure 1: The t-SNE van2008visualizing visualization on the original features $\boldsymbol{z}$ (left) and the features $\Phi(\boldsymbol{z})$ in subspace (right). Our KPCA detection method alleviates the linear inseparability between InD and OoD features in the original $\boldsymbol{z}$-space via the mapping $\Phi$ with substantially improved OoD detection performance, illustrated by the distinguishable reconstruction errors.
  • Figure 2: The framework of our KPCA detection method. A Cosine-Gaussian kernel is devised to model the non-linearity related to InD-OoD disparity in the $\boldsymbol{z}$-space (Section \ref{['sec:non-linear-kernel']}). Explicit mappings $\Phi$ are built to approximate the Cosine-Gaussian kernel for efficient computations in the $\Phi(\boldsymbol{z})$-space (Section \ref{['sec:method:kernel-approximation']}). The left and right histograms indicate the PCA and KPCA reconstruction errors on InD and OoD data, respectively, implying the effectiveness of the Cosine-Gaussian kernel in promoting the linear separability of between InD and OoD data.
  • Figure 3: Illustrations on the impact of the cosine kernel $k_{\rm cos}$. (a) Density histograms on the imbalance of InD and OoD feature norms $\|\boldsymbol{z}\|_2$. (b) The cosine kernel $k_{\rm cos}$ significantly reduces the distance between the InD feature means and the OoD feature means, which benefits the centering process in PCA. The red stars denote the feature means. The dashed and solid lines denote the distance between feature means without and with the cosine kernel $k_{\rm cos}$, respectively. (c)&(d) The cosine kernel $k_{\rm cos}$ leads to more separable reconstruction errors of InD and OoD features and thereby boosts OoD detection performance. InD: CIFAR10 krizhevsky2009learning. OoD: LSUN yu2015lsun and places365 zhou2017places.
  • Figure 4: Low dimensional embeddings via MDS borg2007modern on InD and OoD data in the $\boldsymbol{z}$-space and $\phi_{\rm cos}(\boldsymbol{z})$-space, respectively. InD: CIFAR10 krizhevsky2009learning. OoD: LSUN yu2015lsun and places365 zhou2017places.
  • Figure 5: An illustration on the comparisons between the naive uniform sampling and the proposed low-energy sampling, and their corresponding subspaces.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof