Table of Contents
Fetching ...

Out-of-Distribution Detection with Overlap Index

Hao Fu, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami

TL;DR

This work introduces an overlap index (OI)-based, non-parametric confidence score for out-of-distribution (OOD) detection that is lightweight and scalable to high-dimensional data. It derives a novel upper bound for OI between bounded distributions and converts it into a practical score via a simple algorithm using a set of indicator functions, enabling fast OOD decisions without covariance inversion or heavy training. Empirical results show competitive AUROC across UCI and large-scale CIFAR/ImageNet-like tasks, with significant speed and memory advantages over deep or Gaussian-based detectors; the method also extends to backdoor detection and benefits from pretrained features when available. The approach offers insensitivity to small distributional shifts and robustness to Hub er ε-contamination, with potential for estimating both OI and model accuracy in targeted contexts, while recognizing limitations and the risk of adversarial exploitation. Overall, the OI-based detector provides a principled, efficient alternative for reliable OOD detection in real-world, open-world deployments.

Abstract

Out-of-distribution (OOD) detection is crucial for the deployment of machine learning models in the open world. While existing OOD detectors are effective in identifying OOD samples that deviate significantly from in-distribution (ID) data, they often come with trade-offs. For instance, deep OOD detectors usually suffer from high computational costs, require tuning hyperparameters, and have limited interpretability, whereas traditional OOD detectors may have a low accuracy on large high-dimensional datasets. To address these limitations, we propose a novel effective OOD detection approach that employs an overlap index (OI)-based confidence score function to evaluate the likelihood of a given input belonging to the same distribution as the available ID samples. The proposed OI-based confidence score function is non-parametric, lightweight, and easy to interpret, hence providing strong flexibility and generality. Extensive empirical evaluations indicate that our OI-based OOD detector is competitive with state-of-the-art OOD detectors in terms of detection accuracy on a wide range of datasets while requiring less computation and memory costs. Lastly, we show that the proposed OI-based confidence score function inherits nice properties from OI (e.g., insensitivity to small distributional variations and robustness against Huber $ε$-contamination) and is a versatile tool for estimating OI and model accuracy in specific contexts.

Out-of-Distribution Detection with Overlap Index

TL;DR

This work introduces an overlap index (OI)-based, non-parametric confidence score for out-of-distribution (OOD) detection that is lightweight and scalable to high-dimensional data. It derives a novel upper bound for OI between bounded distributions and converts it into a practical score via a simple algorithm using a set of indicator functions, enabling fast OOD decisions without covariance inversion or heavy training. Empirical results show competitive AUROC across UCI and large-scale CIFAR/ImageNet-like tasks, with significant speed and memory advantages over deep or Gaussian-based detectors; the method also extends to backdoor detection and benefits from pretrained features when available. The approach offers insensitivity to small distributional shifts and robustness to Hub er ε-contamination, with potential for estimating both OI and model accuracy in targeted contexts, while recognizing limitations and the risk of adversarial exploitation. Overall, the OI-based detector provides a principled, efficient alternative for reliable OOD detection in real-world, open-world deployments.

Abstract

Out-of-distribution (OOD) detection is crucial for the deployment of machine learning models in the open world. While existing OOD detectors are effective in identifying OOD samples that deviate significantly from in-distribution (ID) data, they often come with trade-offs. For instance, deep OOD detectors usually suffer from high computational costs, require tuning hyperparameters, and have limited interpretability, whereas traditional OOD detectors may have a low accuracy on large high-dimensional datasets. To address these limitations, we propose a novel effective OOD detection approach that employs an overlap index (OI)-based confidence score function to evaluate the likelihood of a given input belonging to the same distribution as the available ID samples. The proposed OI-based confidence score function is non-parametric, lightweight, and easy to interpret, hence providing strong flexibility and generality. Extensive empirical evaluations indicate that our OI-based OOD detector is competitive with state-of-the-art OOD detectors in terms of detection accuracy on a wide range of datasets while requiring less computation and memory costs. Lastly, we show that the proposed OI-based confidence score function inherits nice properties from OI (e.g., insensitivity to small distributional variations and robustness against Huber -contamination) and is a versatile tool for estimating OI and model accuracy in specific contexts.

Paper Structure

This paper contains 31 sections, 5 theorems, 14 equations, 9 figures, 11 tables, 1 algorithm.

Key Result

Theorem 3.3

Without loss of generality, assume $D^+$ and $D^-$ are two probability distributions on a bounded domain $B \subset \mathbb{R}^n$ with defined normThis paper considers the $L_2$ norm. However, the analysis can be carried out using other norms.$||\cdot||$ (i.e., $\sup_{x\in B} ||x|| < +\infty$), then where $r_A = \sup_{x\in A} ||x||$ and $r_{A^c} = \sup_{x \in A^c}||x||$, $\mu_{D^+}$ and $\mu_{D^-}

Figures (9)

  • Figure 1: Histograms of confidence scores using $\overline{\eta}$, $\eta_1$, and $\eta_2$ with plane as the ID class and the other nine classes as the OOD class in CIFAR-10.
  • Figure 2: AUROC on UCI datasets. Horizontal dashed lines: the mean and standard deviation of our approach.
  • Figure 3: AUROC of our approach with (a) different numbers $m$ of available ID samples and (b) different numbers $k$ of condition functions.
  • Figure 4: Histograms of confidence scores using $\overline{\eta}$ and $\eta_2$ with plane as the ID class and the other nine classes as the OOD class in CIFAR-10.
  • Figure 5: Performance of our approach with different numbers ($k=10, 50, 200$) of condition functions for CIFAR-10 being ID data.
  • ...and 4 more figures

Theorems & Definitions (13)

  • Definition 3.1
  • Definition 3.2
  • Theorem 3.3
  • Corollary 3.4
  • Proposition 5.1
  • Proposition 5.2
  • Theorem 5.3
  • Definition 1: Total Variation Distance (TVD)
  • proof
  • proof
  • ...and 3 more