Table of Contents
Fetching ...

Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection

Charles Guille-Escuret, Pierre-André Noël, Ioannis Mitliagkas, David Vazquez, Joao Monteiro

TL;DR

This work tackles the challenge of broad OOD detection by introducing BROAD, a diverse benchmark that spans five distribution-shift types and twelve datasets, using ImageNet-1K as the in-distribution reference. It critically evaluates a wide range of post-hoc detectors and reveals that performance is inconsistent across shifts, underscoring the need for broad evaluation. To address this, the authors propose a Gaussian Mixture Model ensemble that learns the joint distribution of detection scores, achieving superior and more stable broad OOD performance compared to individual methods. The results highlight the practicality of ensemble-based broad OOD detection and call for extensions to more in-distribution domains and modalities.

Abstract

Improving the reliability of deployed machine learning systems often involves developing methods to detect out-of-distribution (OOD) inputs. However, existing research often narrowly focuses on samples from classes that are absent from the training set, neglecting other types of plausible distribution shifts. This limitation reduces the applicability of these methods in real-world scenarios, where systems encounter a wide variety of anomalous inputs. In this study, we categorize five distinct types of distribution shifts and critically evaluate the performance of recent OOD detection methods on each of them. We publicly release our benchmark under the name BROAD (Benchmarking Resilience Over Anomaly Diversity). Our findings reveal that while these methods excel in detecting unknown classes, their performance is inconsistent when encountering other types of distribution shifts. In other words, they only reliably detect unexpected inputs that they have been specifically designed to expect. As a first step toward broad OOD detection, we learn a generative model of existing detection scores with a Gaussian mixture. By doing so, we present an ensemble approach that offers a more consistent and comprehensive solution for broad OOD detection, demonstrating superior performance compared to existing methods. Our code to download BROAD and reproduce our experiments is publicly available.

Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection

TL;DR

This work tackles the challenge of broad OOD detection by introducing BROAD, a diverse benchmark that spans five distribution-shift types and twelve datasets, using ImageNet-1K as the in-distribution reference. It critically evaluates a wide range of post-hoc detectors and reveals that performance is inconsistent across shifts, underscoring the need for broad evaluation. To address this, the authors propose a Gaussian Mixture Model ensemble that learns the joint distribution of detection scores, achieving superior and more stable broad OOD performance compared to individual methods. The results highlight the practicality of ensemble-based broad OOD detection and call for extensions to more in-distribution domains and modalities.

Abstract

Improving the reliability of deployed machine learning systems often involves developing methods to detect out-of-distribution (OOD) inputs. However, existing research often narrowly focuses on samples from classes that are absent from the training set, neglecting other types of plausible distribution shifts. This limitation reduces the applicability of these methods in real-world scenarios, where systems encounter a wide variety of anomalous inputs. In this study, we categorize five distinct types of distribution shifts and critically evaluate the performance of recent OOD detection methods on each of them. We publicly release our benchmark under the name BROAD (Benchmarking Resilience Over Anomaly Diversity). Our findings reveal that while these methods excel in detecting unknown classes, their performance is inconsistent when encountering other types of distribution shifts. In other words, they only reliably detect unexpected inputs that they have been specifically designed to expect. As a first step toward broad OOD detection, we learn a generative model of existing detection scores with a Gaussian mixture. By doing so, we present an ensemble approach that offers a more consistent and comprehensive solution for broad OOD detection, demonstrating superior performance compared to existing methods. Our code to download BROAD and reproduce our experiments is publicly available.
Paper Structure (15 sections, 3 figures, 15 tables)

This paper contains 15 sections, 3 figures, 15 tables.

Figures (3)

  • Figure 1: An overview of BROAD: illustrating the benchmarks employed for each distribution shift category, with ImageNet-1K serving as the in-distribution reference.
  • Figure 2: Score distributions of MSP, ViM, and MDS across datasets. While all methods discriminate between ImageNet and iNaturalist, their effectiveness fluctuates across the other types of distribution shifts described in Section \ref{['sec:distribution_shifts']}.
  • Figure 3: Covariance matrices of detection scores in-distribution for ViT (left) and ResNet-50 (right).