Table of Contents
Fetching ...

M3OOD: Automatic Selection of Multimodal OOD Detectors

Yuehan Qin, Li Li, Defu Cao, Tiankai Yang, Jiate Li, Yue Zhao

TL;DR

M3OOD is introduced, a meta-learning-based framework for OOD detector selection in multimodal settings that combines multimodal embeddings with handcrafted meta-features that capture distributional and cross-modal characteristics to represent datasets.

Abstract

Out-of-distribution (OOD) robustness is a critical challenge for modern machine learning systems, particularly as they increasingly operate in multimodal settings involving inputs like video, audio, and sensor data. Currently, many OOD detection methods have been proposed, each with different designs targeting various distribution shifts. A single OOD detector may not prevail across all the scenarios; therefore, how can we automatically select an ideal OOD detection model for different distribution shifts? Due to the inherent unsupervised nature of the OOD detection task, it is difficult to predict model performance and find a universally Best model. Also, systematically comparing models on the new unseen data is costly or even impractical. To address this challenge, we introduce M3OOD, a meta-learning-based framework for OOD detector selection in multimodal settings. Meta learning offers a solution by learning from historical model behaviors, enabling rapid adaptation to new data distribution shifts with minimal supervision. Our approach combines multimodal embeddings with handcrafted meta-features that capture distributional and cross-modal characteristics to represent datasets. By leveraging historical performance across diverse multimodal benchmarks, M3OOD can recommend suitable detectors for a new data distribution shift. Experimental evaluation demonstrates that M3OOD consistently outperforms 10 competitive baselines across 12 test scenarios with minimal computational overhead.

M3OOD: Automatic Selection of Multimodal OOD Detectors

TL;DR

M3OOD is introduced, a meta-learning-based framework for OOD detector selection in multimodal settings that combines multimodal embeddings with handcrafted meta-features that capture distributional and cross-modal characteristics to represent datasets.

Abstract

Out-of-distribution (OOD) robustness is a critical challenge for modern machine learning systems, particularly as they increasingly operate in multimodal settings involving inputs like video, audio, and sensor data. Currently, many OOD detection methods have been proposed, each with different designs targeting various distribution shifts. A single OOD detector may not prevail across all the scenarios; therefore, how can we automatically select an ideal OOD detection model for different distribution shifts? Due to the inherent unsupervised nature of the OOD detection task, it is difficult to predict model performance and find a universally Best model. Also, systematically comparing models on the new unseen data is costly or even impractical. To address this challenge, we introduce M3OOD, a meta-learning-based framework for OOD detector selection in multimodal settings. Meta learning offers a solution by learning from historical model behaviors, enabling rapid adaptation to new data distribution shifts with minimal supervision. Our approach combines multimodal embeddings with handcrafted meta-features that capture distributional and cross-modal characteristics to represent datasets. By leveraging historical performance across diverse multimodal benchmarks, M3OOD can recommend suitable detectors for a new data distribution shift. Experimental evaluation demonstrates that M3OOD consistently outperforms 10 competitive baselines across 12 test scenarios with minimal computational overhead.

Paper Structure

This paper contains 33 sections, 6 equations, 6 figures, 7 tables, 2 algorithms.

Figures (6)

  • Figure 1: M3OOD overview; offline meta-training phase is shown on the top: the key is to train a meta performance predictor $f$ to map language embeddings of the datasets and models to their performance $\mathbf{P}$; the online model selection is shown at the bottom by transferring the meta-predictor $f$ to predict the test data paired with OOD detectors for selection.
  • Figure 2: Boxplot of the rank distribution of M3OOD and baselines (the lower, the better). M3OOD is the lowest/best.
  • Figure 3: Average rank (lower is better) of methods w.r.t. performance across datasets; M3OOD outperforms all baselines with the lowest rank.
  • Figure 4: Left: ablation study on different choices of meta-predictor $f$. Tree-based models have better performance. Right: ablation study on different meta embeddings. M3OOD has better performance over its variants.
  • Figure 5: Runtime of M3OOD components vs. time required for multimodal OOD detection on the HMDB dataset. M3OOD incurs a small overhead.
  • ...and 1 more figures