Table of Contents
Fetching ...

CMOOD: Concept-based Multi-label OOD Detection

Zhendong Liu, Yi Nian, Yuehan Qin, Henry Peng Zou, Li Li, Xiyang Hu, Yue Zhao

TL;DR

CMOOD tackles zero-shot, multi-label OOD detection by expanding the label space with fine-grained positive concepts and semantically distant negative concepts, all evaluated through a CLIP-based framework. The method defines an ID score $S_{ID}(I)$ that combines top-$k$ mean similarities to base labels, positive concepts, and negative concepts, enabling robust separation of ID and OOD without training. Empirical results on VOC and COCO show state-of-the-art AUROC (and favorable FPR@95) across ResNet- and ViT-based CLIP variants, with strong interpretability evidenced by visual explanations. The approach offers a practical, efficient, and scalable solution for real-world multi-label OOD detection, with potential impact on safety-critical applications and avenues for domain-specific concept vocabularies.

Abstract

How can models effectively detect out-of-distribution (OOD) samples in complex, multi-label settings without extensive retraining? Existing OOD detection methods struggle to capture the intricate semantic relationships and label co-occurrences inherent in multi-label settings, often requiring large amounts of training data and failing to generalize to unseen label combinations. While large language models have revolutionized zero-shot OOD detection, they primarily focus on single-label scenarios, leaving a critical gap in handling real-world tasks where samples can be associated with multiple interdependent labels. To address these challenges, we introduce COOD, a novel zero-shot multi-label OOD detection framework. COOD leverages pre-trained vision-language models, enhancing them with a concept-based label expansion strategy and a new scoring function. By enriching the semantic space with both positive and negative concepts for each label, our approach models complex label dependencies, precisely differentiating OOD samples without the need for additional training. Extensive experiments demonstrate that our method significantly outperforms existing approaches, achieving approximately 95% average AUROC on both VOC and COCO datasets, while maintaining robust performance across varying numbers of labels and different types of OOD samples.

CMOOD: Concept-based Multi-label OOD Detection

TL;DR

CMOOD tackles zero-shot, multi-label OOD detection by expanding the label space with fine-grained positive concepts and semantically distant negative concepts, all evaluated through a CLIP-based framework. The method defines an ID score that combines top- mean similarities to base labels, positive concepts, and negative concepts, enabling robust separation of ID and OOD without training. Empirical results on VOC and COCO show state-of-the-art AUROC (and favorable FPR@95) across ResNet- and ViT-based CLIP variants, with strong interpretability evidenced by visual explanations. The approach offers a practical, efficient, and scalable solution for real-world multi-label OOD detection, with potential impact on safety-critical applications and avenues for domain-specific concept vocabularies.

Abstract

How can models effectively detect out-of-distribution (OOD) samples in complex, multi-label settings without extensive retraining? Existing OOD detection methods struggle to capture the intricate semantic relationships and label co-occurrences inherent in multi-label settings, often requiring large amounts of training data and failing to generalize to unseen label combinations. While large language models have revolutionized zero-shot OOD detection, they primarily focus on single-label scenarios, leaving a critical gap in handling real-world tasks where samples can be associated with multiple interdependent labels. To address these challenges, we introduce COOD, a novel zero-shot multi-label OOD detection framework. COOD leverages pre-trained vision-language models, enhancing them with a concept-based label expansion strategy and a new scoring function. By enriching the semantic space with both positive and negative concepts for each label, our approach models complex label dependencies, precisely differentiating OOD samples without the need for additional training. Extensive experiments demonstrate that our method significantly outperforms existing approaches, achieving approximately 95% average AUROC on both VOC and COCO datasets, while maintaining robust performance across varying numbers of labels and different types of OOD samples.

Paper Structure

This paper contains 18 sections, 7 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Motivation for CMOOD. Traditional methods struggle with complex multi-label cases. Our approach expands the label space with positive and negative concepts, enabling robust detection of complex OOD samples like "Okapi" and "Spork".
  • Figure 2: Overview of CMOOD . The Concept Generation module uses LLMs to expand base labels into positive ($\mathcal{P}$) and negative ($\mathcal{N}$) concept sets, enhancing the ID-OOD boundary. Positive concepts capture fine-grained, ID-aligned features, while negative concepts provide contrasting OOD-aligned features. The Similarity and ID Score Computation module encodes an input image and computes similarity scores. An ID score based on top-$k$ similarities then classifies the image for ID/OOD.
  • Figure 3: t-SNE figure of label and concept text embeddings, along with corresponding image examples. In scenarios with multiple labels and objects, it is difficult to model the OOD detection problem using a single similarity measure. Instead, the COOD method is employed to address this issue.
  • Figure 4: Analysis of CMOOD on ID (left) and OOD (right) examples. For ID sample (dogs), positive concepts (e.g.,"four-legged creature") receive high weights, confirming ID alignment. For OOD samples (mosquito), negative concepts dominate.