Table of Contents
Fetching ...

Uncertainty-Aware Out-of-Distribution Detection with Gaussian Processes

Yang Chen, Chih-Li Sung, Arpan Kusari, Xiaoyang Song, Wenbo Sun

TL;DR

The paper tackles OOD detection for $K$-class classification by introducing a Gaussian-process-based detector that requires no OOD data during training. It emulates the DNN mapping from an intermediate representation to unconstrained Softmax scores with class-specific GPs, then uses a KL-divergence–based score to distinguish InD from OOD samples, with thresholds learned solely from InD data. Key contributions include (i) a practical InD-only OOD detector compatible with standard architectures, (ii) a probabilistic framework that provides predictive uncertainty for each class, and (iii) extensive experiments showing strong TNR and AUROC on both conventional and large-scale real-world datasets. The approach offers a principled, uncertainty-aware alternative to data-tuned or generative OOD methods, with potential impact on safety-critical deployments where OOD data is rare or unavailable during training.

Abstract

Deep neural networks (DNNs) are often constructed under the closed-world assumption, which may fail to generalize to the out-of-distribution (OOD) data. This leads to DNNs producing overconfident wrong predictions and can result in disastrous consequences in safety-critical applications. Existing OOD detection methods mainly rely on curating a set of OOD data for model training or hyper-parameter tuning to distinguish OOD data from training data (also known as in-distribution data or InD data). However, OOD samples are not always available during the training phase in real-world applications, hindering the OOD detection accuracy. To overcome this limitation, we propose a Gaussian-process-based OOD detection method to establish a decision boundary based on InD data only. The basic idea is to perform uncertainty quantification of the unconstrained softmax scores of a DNN via a multi-class Gaussian process (GP), and then define a score function to separate InD and potential OOD data based on their fundamental differences in the posterior predictive distribution from the GP. Two case studies on conventional image classification datasets and real-world image datasets are conducted to demonstrate that the proposed method outperforms the state-of-the-art OOD detection methods when OOD samples are not observed in the training phase.

Uncertainty-Aware Out-of-Distribution Detection with Gaussian Processes

TL;DR

The paper tackles OOD detection for -class classification by introducing a Gaussian-process-based detector that requires no OOD data during training. It emulates the DNN mapping from an intermediate representation to unconstrained Softmax scores with class-specific GPs, then uses a KL-divergence–based score to distinguish InD from OOD samples, with thresholds learned solely from InD data. Key contributions include (i) a practical InD-only OOD detector compatible with standard architectures, (ii) a probabilistic framework that provides predictive uncertainty for each class, and (iii) extensive experiments showing strong TNR and AUROC on both conventional and large-scale real-world datasets. The approach offers a principled, uncertainty-aware alternative to data-tuned or generative OOD methods, with potential impact on safety-critical deployments where OOD data is rare or unavailable during training.

Abstract

Deep neural networks (DNNs) are often constructed under the closed-world assumption, which may fail to generalize to the out-of-distribution (OOD) data. This leads to DNNs producing overconfident wrong predictions and can result in disastrous consequences in safety-critical applications. Existing OOD detection methods mainly rely on curating a set of OOD data for model training or hyper-parameter tuning to distinguish OOD data from training data (also known as in-distribution data or InD data). However, OOD samples are not always available during the training phase in real-world applications, hindering the OOD detection accuracy. To overcome this limitation, we propose a Gaussian-process-based OOD detection method to establish a decision boundary based on InD data only. The basic idea is to perform uncertainty quantification of the unconstrained softmax scores of a DNN via a multi-class Gaussian process (GP), and then define a score function to separate InD and potential OOD data based on their fundamental differences in the posterior predictive distribution from the GP. Two case studies on conventional image classification datasets and real-world image datasets are conducted to demonstrate that the proposed method outperforms the state-of-the-art OOD detection methods when OOD samples are not observed in the training phase.
Paper Structure (11 sections, 1 theorem, 39 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 11 sections, 1 theorem, 39 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Suppose $\mathbf{x}'\in\mathcal{X}$ and $k=\text{arg}\,max_{l=1,\ldots,K}{f}_{l}(\mathbf{x}')$. Consider the squared exponential kernel as defined in eq:kernel function, and assume that the hyper-parameters $\Theta_k$ and $\tau^2_k$ are fixed. Then, it follows that $g(\mathbf{x}')=1$ if where $\boldsymbol{\Phi}_k=\Phi_k(\boldsymbol{\xi}(X_{\rm{GP}}^k), \boldsymbol{\xi}(X_{\rm{GP}}^k))$, and $\la

Figures (4)

  • Figure 1: Flowchart of uncertainty-aware OOD detection via a multi-class GP model.
  • Figure 2: Illustration of InD (in-distribution) predictive distributions (blue, solid lines) with smaller variances and larger means, compared to OOD (out-of-distribution) predictive distributions (red, dashed lines) with larger variances and smaller means. The x-axis represents the predicted values $f_k(\mathbf{x})|\mathbf{z}_{\rm{InD}}^k$, and the y-axis represents the density of the predictive distributions.
  • Figure 3: Visualization of the InD and OOD data from the conventional image classification datasets using t-SNE. The InD data ($\circ$) forms 10 distinct clusters, corresponding to the 10 classes from the MNIST data. The OOD data ($\times$) are distinguishable from the InD data. Different colors represent different datasets.
  • Figure 4: Visualization of the InD and OOD data from the large-scale OOD learning studies using tSNE. The InD data ($\circ$) forms 10 distinct clusters, corresponding to the 10 classes from the ImageNet dataset. The majority of the OOD data ($\times$) are distinguishable from the InD data.

Theorems & Definitions (2)

  • Theorem 1
  • proof