Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Chentao Cao; Zhun Zhong; Zhanke Zhou; Yang Liu; Tongliang Liu; Bo Han

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Chentao Cao, Zhun Zhong, Zhanke Zhou, Yang Liu, Tongliang Liu, Bo Han

TL;DR

To address open-world OOD detection without access to OOD data, this work proposes Envisioning Outlier Exposure (EOE), which uses large-language-model prompts to generate potential outlier labels guided by a visual similarity rule and integrates them with CLIP embeddings through a novel OOD score, S_EOE. The approach supports far, near, and fine-grained OOD tasks and scales to large label spaces like ImageNet-1K, achieving state-of-the-art or competitive results across diverse benchmarks without extra training data. Key contributions include task-specific LLM prompts, a penalty-based scoring function, and extensive ablations showing robustness to LLM choice, prompt design, and backbone, as well as qualitative visualizations explaining EOE’s effectiveness. The work demonstrates that incorporating envisioned outliers from LLMs can significantly enhance open-world OOD detection in a zero-shot, scalable manner.

Abstract

Detecting out-of-distribution (OOD) samples is essential when deploying machine learning models in open-world scenarios. Zero-shot OOD detection, requiring no training on in-distribution (ID) data, has been possible with the advent of vision-language models like CLIP. Existing methods build a text-based classifier with only closed-set labels. However, this largely restricts the inherent capability of CLIP to recognize samples from large and open label space. In this paper, we propose to tackle this constraint by leveraging the expert knowledge and reasoning capability of large language models (LLM) to Envision potential Outlier Exposure, termed EOE, without access to any actual OOD data. Owing to better adaptation to open-world scenarios, EOE can be generalized to different tasks, including far, near, and fine-grained OOD detection. Technically, we design (1) LLM prompts based on visual similarity to generate potential outlier class labels specialized for OOD detection, as well as (2) a new score function based on potential outlier penalty to distinguish hard OOD samples effectively. Empirically, EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the ImageNet-1K dataset. The code is publicly available at: https://github.com/tmlr-group/EOE.

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

TL;DR

Abstract

Paper Structure (31 sections, 10 equations, 18 figures, 25 tables, 1 algorithm)

This paper contains 31 sections, 10 equations, 18 figures, 25 tables, 1 algorithm.

Introduction
Preliminaries
Envisioning Outlier Exposure for Zero-shot OOD Detection
Acquiring Envisioned Outlier Class Labels
A New OOD Detection Score
Experiments
Setups
Main Results
Ablation Study
Further Analysis
Related Works
Conclusion
Further Analysis
Understanding EOE's Effectiveness: Without Hitting Actual OOD Classes
Further Analysis on the Design of $S_\text{EOE}$
...and 16 more sections

Figures (18)

Figure 1: Comparison of zero-shot OOD detection score distribution. Compared to the model using (a) only closed-set ID classes, (b) adding actual OOD class labels can largely increase the OOD detection performance. (c) By adding the outlier classes generated by our method, the OOD detection results can also be significantly improved without using the actual OOD class labels. We use CUB-200-2011 wah2011caltech as ID classes and Places zhou2017places as OOD classes.
Figure 2: The framework of the proposed EOE. Given a set of ID class labels $\mathcal{Y}_\text{id}$, we first leverage the designed prompts to generate a set of outlier class labels, $\mathcal{Y}_\text{outlier}$, by using a LLM. Then, we input both the ID and generated OOD class labels into the text encoder for building the textual classifier. During the test stage, given an input image, we obtain the visual feature by the image encoder and calculate the similarities between the visual feature and the textual classifier. Finally, the OOD score is obtained by scaling the similarities with the proposed detection score function $S_{\text{EOE}}$.
Figure 3: LLM prompt for far OOD detection, consisting of both the contents of Q and A.
Figure 4: LLM prompt for near OOD detection.
Figure 5: LLM prompt for fine-grained OOD detection.
...and 13 more figures

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

TL;DR

Abstract

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (18)