Table of Contents
Fetching ...

COMBOOD: A Semiparametric Approach for Detecting Out-of-distribution Data for Image Classification

Magesh Rajasekaran, Md Saiful Islam Sajol, Frej Berglind, Supratik Mukhopadhyay, Kamalika Das

TL;DR

The paper tackles the critical problem of identifying out-of-distribution data for image classification by introducing COMBOOD, a semi-parametric detector that fuses non-parametric nearest-neighbor signals with parametric regularized Mahalanobis distance. It leverages two complementary feature extraction strategies—global extrema across layers and penultimate-layer embeddings—to form robust distance-based priors, with a simple fusion score score = kc + mc. Empirical results on OpenOOD benchmarks and a document dataset show COMBOOD achieving superior accuracy and favorable inference times, with many improvements statistically significant. The work offers a practical, scalable solution for reliable OOD detection in real-world applications, while outlining limitations in priors fusion and proposing data-driven extensions for future work.

Abstract

Identifying out-of-distribution (OOD) data at inference time is crucial for many machine learning applications, especially for automation. We present a novel unsupervised semi-parametric framework COMBOOD for OOD detection with respect to image recognition. Our framework combines signals from two distance metrics, nearest-neighbor and Mahalanobis, to derive a confidence score for an inference point to be out-of-distribution. The former provides a non-parametric approach to OOD detection. The latter provides a parametric, simple, yet effective method for detecting OOD data points, especially, in the far OOD scenario, where the inference point is far apart from the training data set in the embedding space. However, its performance is not satisfactory in the near OOD scenarios that arise in practical situations. Our COMBOOD framework combines the two signals in a semi-parametric setting to provide a confidence score that is accurate both for the near-OOD and far-OOD scenarios. We show experimental results with the COMBOOD framework for different types of feature extraction strategies. We demonstrate experimentally that COMBOOD outperforms state-of-the-art OOD detection methods on the OpenOOD (both version 1 and most recent version 1.5) benchmark datasets (for both far-OOD and near-OOD) as well as on the documents dataset in terms of accuracy. On a majority of the benchmark datasets, the improvements in accuracy resulting from the COMBOOD framework are statistically significant. COMBOOD scales linearly with the size of the embedding space, making it ideal for many real-life applications.

COMBOOD: A Semiparametric Approach for Detecting Out-of-distribution Data for Image Classification

TL;DR

The paper tackles the critical problem of identifying out-of-distribution data for image classification by introducing COMBOOD, a semi-parametric detector that fuses non-parametric nearest-neighbor signals with parametric regularized Mahalanobis distance. It leverages two complementary feature extraction strategies—global extrema across layers and penultimate-layer embeddings—to form robust distance-based priors, with a simple fusion score score = kc + mc. Empirical results on OpenOOD benchmarks and a document dataset show COMBOOD achieving superior accuracy and favorable inference times, with many improvements statistically significant. The work offers a practical, scalable solution for reliable OOD detection in real-world applications, while outlining limitations in priors fusion and proposing data-driven extensions for future work.

Abstract

Identifying out-of-distribution (OOD) data at inference time is crucial for many machine learning applications, especially for automation. We present a novel unsupervised semi-parametric framework COMBOOD for OOD detection with respect to image recognition. Our framework combines signals from two distance metrics, nearest-neighbor and Mahalanobis, to derive a confidence score for an inference point to be out-of-distribution. The former provides a non-parametric approach to OOD detection. The latter provides a parametric, simple, yet effective method for detecting OOD data points, especially, in the far OOD scenario, where the inference point is far apart from the training data set in the embedding space. However, its performance is not satisfactory in the near OOD scenarios that arise in practical situations. Our COMBOOD framework combines the two signals in a semi-parametric setting to provide a confidence score that is accurate both for the near-OOD and far-OOD scenarios. We show experimental results with the COMBOOD framework for different types of feature extraction strategies. We demonstrate experimentally that COMBOOD outperforms state-of-the-art OOD detection methods on the OpenOOD (both version 1 and most recent version 1.5) benchmark datasets (for both far-OOD and near-OOD) as well as on the documents dataset in terms of accuracy. On a majority of the benchmark datasets, the improvements in accuracy resulting from the COMBOOD framework are statistically significant. COMBOOD scales linearly with the size of the embedding space, making it ideal for many real-life applications.
Paper Structure (22 sections, 6 equations, 1 figure, 5 tables, 2 algorithms)

This paper contains 22 sections, 6 equations, 1 figure, 5 tables, 2 algorithms.

Figures (1)

  • Figure 1: AUROC-score of Regularized Mahalanobis for various regularization values. The score was computed on the test set of the different dataset such as CIFAR10, CIFAR100 and MNIST.