Table of Contents
Fetching ...

Improving Out-of-Distribution Detection by Combining Existing Post-hoc Methods

Paul Novello, Yannick Prudent, Joseba Dalmau, Corentin Friedrich, Yann Pequignot

TL;DR

The paper tackles the challenge that no single post-hoc OOD detector dominates across datasets by proposing to fuse many existing OOD scores. It introduces four multivariate combination strategies—Majority Vote, Empirical CDF, Copula-based CDF, and Center-Outward Quantiles—and extends evaluation metrics to multidimensional detectors. Through extensive experiments on OpenOOD across CIFAR-10/100 and ImageNet-200, it shows that score fusion consistently improves AUROC over the best individual detectors and provides practical guidelines for selecting combinations with or without access to OOD data, including Outlier Exposure. The approach is flexible, scalable to different tasks, and comes with open-source code to facilitate adoption in safety-critical applications. The work thus offers a principled, data-efficient route to more robust OOD detection by leveraging complementary information across many existing detectors.

Abstract

Since the seminal paper of Hendrycks et al. arXiv:1610.02136, Post-hoc deep Out-of-Distribution (OOD) detection has expanded rapidly. As a result, practitioners working on safety-critical applications and seeking to improve the robustness of a neural network now have a plethora of methods to choose from. However, no method outperforms every other on every dataset arXiv:2210.07242, so the current best practice is to test all the methods on the datasets at hand. This paper shifts focus from developing new methods to effectively combining existing ones to enhance OOD detection. We propose and compare four different strategies for integrating multiple detection scores into a unified OOD detector, based on techniques such as majority vote, empirical and copulas-based Cumulative Distribution Function modeling, and multivariate quantiles based on optimal transport. We extend common OOD evaluation metrics -- like AUROC and FPR at fixed TPR rates -- to these multi-dimensional OOD detectors, allowing us to evaluate them and compare them with individual methods on extensive benchmarks. Furthermore, we propose a series of guidelines to choose what OOD detectors to combine in more realistic settings, i.e. in the absence of known OOD data, relying on principles drawn from Outlier Exposure arXiv:1812.04606. The code is available at https://github.com/paulnovello/multi-ood.

Improving Out-of-Distribution Detection by Combining Existing Post-hoc Methods

TL;DR

The paper tackles the challenge that no single post-hoc OOD detector dominates across datasets by proposing to fuse many existing OOD scores. It introduces four multivariate combination strategies—Majority Vote, Empirical CDF, Copula-based CDF, and Center-Outward Quantiles—and extends evaluation metrics to multidimensional detectors. Through extensive experiments on OpenOOD across CIFAR-10/100 and ImageNet-200, it shows that score fusion consistently improves AUROC over the best individual detectors and provides practical guidelines for selecting combinations with or without access to OOD data, including Outlier Exposure. The approach is flexible, scalable to different tasks, and comes with open-source code to facilitate adoption in safety-critical applications. The work thus offers a principled, data-efficient route to more robust OOD detection by leveraging complementary information across many existing detectors.

Abstract

Since the seminal paper of Hendrycks et al. arXiv:1610.02136, Post-hoc deep Out-of-Distribution (OOD) detection has expanded rapidly. As a result, practitioners working on safety-critical applications and seeking to improve the robustness of a neural network now have a plethora of methods to choose from. However, no method outperforms every other on every dataset arXiv:2210.07242, so the current best practice is to test all the methods on the datasets at hand. This paper shifts focus from developing new methods to effectively combining existing ones to enhance OOD detection. We propose and compare four different strategies for integrating multiple detection scores into a unified OOD detector, based on techniques such as majority vote, empirical and copulas-based Cumulative Distribution Function modeling, and multivariate quantiles based on optimal transport. We extend common OOD evaluation metrics -- like AUROC and FPR at fixed TPR rates -- to these multi-dimensional OOD detectors, allowing us to evaluate them and compare them with individual methods on extensive benchmarks. Furthermore, we propose a series of guidelines to choose what OOD detectors to combine in more realistic settings, i.e. in the absence of known OOD data, relying on principles drawn from Outlier Exposure arXiv:1812.04606. The code is available at https://github.com/paulnovello/multi-ood.
Paper Structure (45 sections, 23 equations, 21 figures, 4 tables, 1 algorithm)

This paper contains 45 sections, 23 equations, 21 figures, 4 tables, 1 algorithm.

Figures (21)

  • Figure 1: Plot of ID (Imagenet-200) and OOD (Textures) scores for three selected OOD score functions. The scores are not always correlated (we picked three that are not for illustration purposes), suggesting that a more sophisticated decision boundary could be built and improve OOD detection.
  • Figure 2: Visualization of sets $A_t$ for different combination methods with ImageNet-200 as ID data.
  • Figure 3: Scatter plot of AUROC obtained for pairs of OOD detectors combined with Center Outward method x-axis: on proxy OOD data (OE) and y-axis: on near OOD and far OOD OpenOOD benchmark for Cifar10 considered in-distribution.
  • Figure 4: Pareto fronts with individual methods and sets of detectors combined with Copulas returned after a Sensitivity Analysis based on AUROCS obtained on Outlier Exposure datasets.
  • Figure 5: (Left) Source distribution: nested hyperspheres intersected with quadrant $\mathbb{R}^2_+$
  • ...and 16 more figures