Table of Contents
Fetching ...

Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models

Mengyuan Chen, Junyu Gao, Changsheng Xu

TL;DR

It is theorized that enhancing performance requires expanding the semantic pool, while increasing the expected probability of selected OOD labels being activated by OOD samples, and ensuring low mutual dependence among the activations of these OOD labels.

Abstract

A straightforward pipeline for zero-shot out-of-distribution (OOD) detection involves selecting potential OOD labels from an extensive semantic pool and then leveraging a pre-trained vision-language model to perform classification on both in-distribution (ID) and OOD labels. In this paper, we theorize that enhancing performance requires expanding the semantic pool, while increasing the expected probability of selected OOD labels being activated by OOD samples, and ensuring low mutual dependence among the activations of these OOD labels. A natural expansion manner is to adopt a larger lexicon; however, the inevitable introduction of numerous synonyms and uncommon words fails to meet the above requirements, indicating that viable expansion manners move beyond merely selecting words from a lexicon. Since OOD detection aims to correctly classify input images into ID/OOD class groups, we can "make up" OOD label candidates which are not standard class names but beneficial for the process. Observing that the original semantic pool is comprised of unmodified specific class names, we correspondingly construct a conjugated semantic pool (CSP) consisting of modified superclass names, each serving as a cluster center for samples sharing similar properties across different categories. Consistent with our established theory, expanding OOD label candidates with the CSP satisfies the requirements and outperforms existing works by 7.89% in FPR95. Codes are available in https://github.com/MengyuanChen21/NeurIPS2024-CSP.

Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models

TL;DR

It is theorized that enhancing performance requires expanding the semantic pool, while increasing the expected probability of selected OOD labels being activated by OOD samples, and ensuring low mutual dependence among the activations of these OOD labels.

Abstract

A straightforward pipeline for zero-shot out-of-distribution (OOD) detection involves selecting potential OOD labels from an extensive semantic pool and then leveraging a pre-trained vision-language model to perform classification on both in-distribution (ID) and OOD labels. In this paper, we theorize that enhancing performance requires expanding the semantic pool, while increasing the expected probability of selected OOD labels being activated by OOD samples, and ensuring low mutual dependence among the activations of these OOD labels. A natural expansion manner is to adopt a larger lexicon; however, the inevitable introduction of numerous synonyms and uncommon words fails to meet the above requirements, indicating that viable expansion manners move beyond merely selecting words from a lexicon. Since OOD detection aims to correctly classify input images into ID/OOD class groups, we can "make up" OOD label candidates which are not standard class names but beneficial for the process. Observing that the original semantic pool is comprised of unmodified specific class names, we correspondingly construct a conjugated semantic pool (CSP) consisting of modified superclass names, each serving as a cluster center for samples sharing similar properties across different categories. Consistent with our established theory, expanding OOD label candidates with the CSP satisfies the requirements and outperforms existing works by 7.89% in FPR95. Codes are available in https://github.com/MengyuanChen21/NeurIPS2024-CSP.

Paper Structure

This paper contains 33 sections, 1 theorem, 34 equations, 16 figures, 10 tables.

Key Result

Lemma 1

Given independent Bernoulli random variables $\{s_1,...,s_m\}$ with parameters $\{p_1,...,p_m\}$, where $0<p_i<1$, as $m$ goes to infinity, the Poisson binomial random variable $C=\sum_{i=1}^m s_i$ converges in distribution to a normal random variable with distribution $\mathcal{N}\left(\sum_{i=1}^{

Figures (16)

  • Figure 1: Model performances evaluated by FPR50 and FPR95 (lower is better) of our method and NegLabel against the ratio $r$, which exhibit a trend of initial decline followed by an increase. Detailed results can be found in Table \ref{['tab:ratio']}.
  • Figure 2: Model performances evaluated by FPR95 (lower is better) with lexicons of different sizes. Detailed results can be found in Table \ref{['tab:dictionary']}.
  • Figure 3: An illustrative diagram of an element in the conjugated semantic pool (CSP). Category names can be regarded as the centers of category clusters. Similarly, elements in CSP can be considered as cluster centers of superclass objects with similar properties.
  • Figure 4: ID Examples of correct OOD detection, correct classification, and high confidence.
  • Figure 5: ID Examples of correct OOD detection, correct classification, and low confidence.
  • ...and 11 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof