Table of Contents
Fetching ...

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

Hao Dong, Yue Zhao, Eleni Chatzi, Olga Fink

TL;DR

This work introduces the first-of-its-kind benchmark, MultiOOD, characterized by diverse dataset sizes and varying modality combinations, and introduces a novel outlier synthesis method, NP-Mix, which explores broader feature spaces by leveraging the information from nearest neighbor classes and complements A2D to strengthen OOD detection performance.

Abstract

Detecting out-of-distribution (OOD) samples is important for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. Existing research has mainly focused on unimodal scenarios on image data. However, real-world applications are inherently multimodal, which makes it essential to leverage information from multiple modalities to enhance the efficacy of OOD detection. To establish a foundation for more realistic Multimodal OOD Detection, we introduce the first-of-its-kind benchmark, MultiOOD, characterized by diverse dataset sizes and varying modality combinations. We first evaluate existing unimodal OOD detection algorithms on MultiOOD, observing that the mere inclusion of additional modalities yields substantial improvements. This underscores the importance of utilizing multiple modalities for OOD detection. Based on the observation of Modality Prediction Discrepancy between in-distribution (ID) and OOD data, and its strong correlation with OOD performance, we propose the Agree-to-Disagree (A2D) algorithm to encourage such discrepancy during training. Moreover, we introduce a novel outlier synthesis method, NP-Mix, which explores broader feature spaces by leveraging the information from nearest neighbor classes and complements A2D to strengthen OOD detection performance. Extensive experiments on MultiOOD demonstrate that training with A2D and NP-Mix improves existing OOD detection algorithms by a large margin. Our source code and MultiOOD benchmark are available at https://github.com/donghao51/MultiOOD.

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

TL;DR

This work introduces the first-of-its-kind benchmark, MultiOOD, characterized by diverse dataset sizes and varying modality combinations, and introduces a novel outlier synthesis method, NP-Mix, which explores broader feature spaces by leveraging the information from nearest neighbor classes and complements A2D to strengthen OOD detection performance.

Abstract

Detecting out-of-distribution (OOD) samples is important for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. Existing research has mainly focused on unimodal scenarios on image data. However, real-world applications are inherently multimodal, which makes it essential to leverage information from multiple modalities to enhance the efficacy of OOD detection. To establish a foundation for more realistic Multimodal OOD Detection, we introduce the first-of-its-kind benchmark, MultiOOD, characterized by diverse dataset sizes and varying modality combinations. We first evaluate existing unimodal OOD detection algorithms on MultiOOD, observing that the mere inclusion of additional modalities yields substantial improvements. This underscores the importance of utilizing multiple modalities for OOD detection. Based on the observation of Modality Prediction Discrepancy between in-distribution (ID) and OOD data, and its strong correlation with OOD performance, we propose the Agree-to-Disagree (A2D) algorithm to encourage such discrepancy during training. Moreover, we introduce a novel outlier synthesis method, NP-Mix, which explores broader feature spaces by leveraging the information from nearest neighbor classes and complements A2D to strengthen OOD detection performance. Extensive experiments on MultiOOD demonstrate that training with A2D and NP-Mix improves existing OOD detection algorithms by a large margin. Our source code and MultiOOD benchmark are available at https://github.com/donghao51/MultiOOD.
Paper Structure (28 sections, 15 equations, 13 figures, 16 tables)

This paper contains 28 sections, 15 equations, 13 figures, 16 tables.

Figures (13)

  • Figure 1: The FPR95 (lower is better) and AUROC (higher is better) on HMDB51 dataset across various modalities. Multimodal OOD substantially improves unimodal OOD w/o bells and whistles.
  • Figure 2: An overview of MultiOOD Benchmark.
  • Figure 3: An example of softmax outputs for ID and OOD data. The predictions from video and optical flow demonstrate uniformity across ID data and exhibit variability across OOD data.
  • Figure 4: Average prediction $L_1$ distances between ID and OOD data ($l_{OOD}-l_{ID}$) before and after A2D training across various datasets within the MultiOOD, where Energy energyood20nips is used as score function. The distances are highly correlated to the ultimate OOD performance. A2D training amplifies such distances, consequently enhancing the efficacy of OOD detection.
  • Figure 5: An overview of the proposed framework for Multimodal OOD Detection. We introduce A2D algorithm to encourage enlarging the prediction discrepancy across modalities. Additionally, we propose a novel outlier synthesis algorithm, NP-Mix, designed to explore broader feature spaces, which complements A2D to strengthen the OOD detection performance.
  • ...and 8 more figures