Table of Contents
Fetching ...

Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure

Choubo Ding, Guansong Pang

TL;DR

This paper addresses zero-shot out-of-distribution detection for vision-language models by introducing Outlier Label Exposure (OLE), which prompts CLIP with a large set of auxiliary outlier labels to provide prior OOD knowledge. To make this scalable and robust, it develops Outlier Prototype Learning (OPL) to compress noisy outlier labels into a small set of prototypes, and Hard Outlier Prototype Generation (HOPG) to synthesize in-between prototypes that tighten the ID/OOD boundary. Inference integrates these prototypes into the CLIPN scoring, significantly improving OOD discrimination and achieving state-of-the-art results on large-scale and hard OOD benchmarks. The approach remains lightweight, requiring no training data from ID distributions, and demonstrates strong generalization across diverse OOD datasets with two different outlier label sets. This work offers a practical, data-efficient pathway to safer zero-shot deployment of VLMs in open-world settings.

Abstract

As vision-language models like CLIP are widely applied to zero-shot tasks and gain remarkable performance on in-distribution (ID) data, detecting and rejecting out-of-distribution (OOD) inputs in the zero-shot setting have become crucial for ensuring the safety of using such models on the fly. Most existing zero-shot OOD detectors rely on ID class label-based prompts to guide CLIP in classifying ID images and rejecting OOD images. In this work we instead propose to leverage a large set of diverse auxiliary outlier class labels as pseudo OOD class text prompts to CLIP for enhancing zero-shot OOD detection, an approach we called Outlier Label Exposure (OLE). The key intuition is that ID images are expected to have lower similarity to these outlier class prompts than OOD images. One issue is that raw class labels often include noise labels, e.g., synonyms of ID labels, rendering raw OLE-based detection ineffective. To address this issue, we introduce an outlier prototype learning module that utilizes the prompt embeddings of the outlier labels to learn a small set of pivotal outlier prototypes for an embedding similarity-based OOD scoring. Additionally, the outlier classes and their prototypes can be loosely coupled with the ID classes, leading to an inseparable decision region between them. Thus, we also introduce an outlier label generation module that synthesizes our outlier prototypes and ID class embeddings to generate in-between outlier prototypes to further calibrate the detection in OLE. Despite its simplicity, extensive experiments show that OLE substantially improves detection performance and achieves new state-of-the-art performance in large-scale OOD and hard OOD detection benchmarks.

Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure

TL;DR

This paper addresses zero-shot out-of-distribution detection for vision-language models by introducing Outlier Label Exposure (OLE), which prompts CLIP with a large set of auxiliary outlier labels to provide prior OOD knowledge. To make this scalable and robust, it develops Outlier Prototype Learning (OPL) to compress noisy outlier labels into a small set of prototypes, and Hard Outlier Prototype Generation (HOPG) to synthesize in-between prototypes that tighten the ID/OOD boundary. Inference integrates these prototypes into the CLIPN scoring, significantly improving OOD discrimination and achieving state-of-the-art results on large-scale and hard OOD benchmarks. The approach remains lightweight, requiring no training data from ID distributions, and demonstrates strong generalization across diverse OOD datasets with two different outlier label sets. This work offers a practical, data-efficient pathway to safer zero-shot deployment of VLMs in open-world settings.

Abstract

As vision-language models like CLIP are widely applied to zero-shot tasks and gain remarkable performance on in-distribution (ID) data, detecting and rejecting out-of-distribution (OOD) inputs in the zero-shot setting have become crucial for ensuring the safety of using such models on the fly. Most existing zero-shot OOD detectors rely on ID class label-based prompts to guide CLIP in classifying ID images and rejecting OOD images. In this work we instead propose to leverage a large set of diverse auxiliary outlier class labels as pseudo OOD class text prompts to CLIP for enhancing zero-shot OOD detection, an approach we called Outlier Label Exposure (OLE). The key intuition is that ID images are expected to have lower similarity to these outlier class prompts than OOD images. One issue is that raw class labels often include noise labels, e.g., synonyms of ID labels, rendering raw OLE-based detection ineffective. To address this issue, we introduce an outlier prototype learning module that utilizes the prompt embeddings of the outlier labels to learn a small set of pivotal outlier prototypes for an embedding similarity-based OOD scoring. Additionally, the outlier classes and their prototypes can be loosely coupled with the ID classes, leading to an inseparable decision region between them. Thus, we also introduce an outlier label generation module that synthesizes our outlier prototypes and ID class embeddings to generate in-between outlier prototypes to further calibrate the detection in OLE. Despite its simplicity, extensive experiments show that OLE substantially improves detection performance and achieves new state-of-the-art performance in large-scale OOD and hard OOD detection benchmarks.
Paper Structure (13 sections, 7 equations, 5 figures, 3 tables)

This paper contains 13 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: (a) Zero-shot classifiers do not consider OOD inputs; all OOD inputs are predicted as ID classes. (b) Current zero-shot OOD detectors help alleviate this issue but they often have overconfident predictions of OOD samples due to the lack of knowledge about OOD data. (c) Our method OLE mitigates this problem by utilizing outlier class labels via text prompts to enable more OOD-informed detection.
  • Figure 2: An overall framework of the proposed approach OLE. (a) present a high-level pipeline of detecting OOD inputs through the outlier label exposure in CLIP. (b) shows the process of the OPL and HOPG modules.
  • Figure 3: t-SNE Visualization of the ID class prompt embeddings of ImageNet-1K. Orange points denote the selected fringe ID classes. (a) The selected fringe ID class embeddings that have the largest average distance to all ID class embeddings. (b) The fringe ID class embeddings are identified within their clusters.
  • Figure 4: ID score density of OLE and its variants with iNaturalist as OOD data. The overlapping regions are shown in grey.
  • Figure 5: Sensitivity analysis w.r.t. (Top) the number of outlier prototypes and (Bottom) the $p$-th percentile in our OPL module.