MetaOOD: Automatic Selection of OOD Detection Models

Yuehan Qin; Yichi Zhang; Yi Nian; Xueying Ding; Yue Zhao

MetaOOD: Automatic Selection of OOD Detection Models

Yuehan Qin, Yichi Zhang, Yi Nian, Xueying Ding, Yue Zhao

TL;DR

MetaOOD addresses the problem of selecting an out-of-distribution (OOD) detector without labeled test data by learning a meta-model that maps language-embedding representations of dataset pairs and detectors to their past performance in a matrix $P$. It performs offline meta-training to learn a predictor $f$ over embeddings $E^{data}$ and $E^{model}$ and applies this zero-shot at test time to choose the best detector for a new dataset pair $D_{new}$. Empirical results on 24 dataset pairs with 11 detectors show MetaOOD outperforms all baselines with statistical significance and only marginal runtime overhead, demonstrating robust, data-efficient OOD detector selection for open-world applications.

Abstract

How can we automatically select an out-of-distribution (OOD) detection model for various underlying tasks? This is crucial for maintaining the reliability of open-world applications by identifying data distribution shifts, particularly in critical domains such as online transactions, autonomous driving, and real-time patient diagnosis. Despite the availability of numerous OOD detection methods, the challenge of selecting an optimal model for diverse tasks remains largely underexplored, especially in scenarios lacking ground truth labels. In this work, we introduce MetaOOD, the first zero-shot, unsupervised framework that utilizes meta-learning to select an OOD detection model automatically. As a meta-learning approach, MetaOOD leverages historical performance data of existing methods across various benchmark OOD detection datasets, enabling the effective selection of a suitable model for new datasets without the need for labeled data at the test time. To quantify task similarities more accurately, we introduce language model-based embeddings that capture the distinctive OOD characteristics of both datasets and detection models. Through extensive experimentation with 24 unique test dataset pairs to choose from among 11 OOD detection models, we demonstrate that MetaOOD significantly outperforms existing methods and only brings marginal time overhead. Our results, validated by Wilcoxon statistical tests, show that MetaOOD surpasses a diverse group of 11 baselines, including established OOD detectors and advanced unsupervised selection methods.

MetaOOD: Automatic Selection of OOD Detection Models

TL;DR

. It performs offline meta-training to learn a predictor

over embeddings

and

and applies this zero-shot at test time to choose the best detector for a new dataset pair

. Empirical results on 24 dataset pairs with 11 detectors show MetaOOD outperforms all baselines with statistical significance and only marginal runtime overhead, demonstrating robust, data-efficient OOD detector selection for open-world applications.

Abstract

Paper Structure (31 sections, 3 equations, 10 figures, 8 tables, 2 algorithms)

This paper contains 31 sections, 3 equations, 10 figures, 8 tables, 2 algorithms.

Introduction
Related Work
Unsupervised OOD detection Model Selection
Supervised OOD detection Model Selection
Represent Datasets and Models as Embeddings for Meta-learning
MetaOOD for OOD Detection Model Selection
Preliminaries on OOD Detection
Problem Statement and Framework Overview
Offline Meta-Training
Data and Model Embeddings
Online Model Selection
Experiments
Experiment Setting
Model Selection Baselines
Overall Results
...and 16 more sections

Figures (10)

Figure 1: MetaOOD overview (§ \ref{['subsec:overview']}); offline meta-training phase is shown on the top (§ \ref{['subsec:meta-train']})—the key is to train a meta performance predictor $f$ (denoted in ) to map language embeddings of the datasets and models to their performance $\mathbf{P}$; the online model selection (§ \ref{['subsec:model-selection']}) is shown at the bottom by transferring the meta-predictor $f$ to predict the test data paired with OOD detectors for selection.
Figure 2: Average rank (lower is better) of methods w.r.t. performance across datasets; MetaOOD outperforms all baselines with the lowest rank.
Figure 3: Boxplot of the rank distribution of MetaOOD and baselines (the lower, the better). MetaOOD is the lowest/best.
Figure 4: Ablation study on different data and model embeddings. MetaOOD has better performance over its variants.
Figure 5: Ablation study on different choices of meta-predictor $f$. Tree-based models have better performance.
...and 5 more figures

MetaOOD: Automatic Selection of OOD Detection Models

TL;DR

Abstract

MetaOOD: Automatic Selection of OOD Detection Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)