CMOOD: Concept-based Multi-label OOD Detection
Zhendong Liu, Yi Nian, Yuehan Qin, Henry Peng Zou, Li Li, Xiyang Hu, Yue Zhao
TL;DR
CMOOD tackles zero-shot, multi-label OOD detection by expanding the label space with fine-grained positive concepts and semantically distant negative concepts, all evaluated through a CLIP-based framework. The method defines an ID score $S_{ID}(I)$ that combines top-$k$ mean similarities to base labels, positive concepts, and negative concepts, enabling robust separation of ID and OOD without training. Empirical results on VOC and COCO show state-of-the-art AUROC (and favorable FPR@95) across ResNet- and ViT-based CLIP variants, with strong interpretability evidenced by visual explanations. The approach offers a practical, efficient, and scalable solution for real-world multi-label OOD detection, with potential impact on safety-critical applications and avenues for domain-specific concept vocabularies.
Abstract
How can models effectively detect out-of-distribution (OOD) samples in complex, multi-label settings without extensive retraining? Existing OOD detection methods struggle to capture the intricate semantic relationships and label co-occurrences inherent in multi-label settings, often requiring large amounts of training data and failing to generalize to unseen label combinations. While large language models have revolutionized zero-shot OOD detection, they primarily focus on single-label scenarios, leaving a critical gap in handling real-world tasks where samples can be associated with multiple interdependent labels. To address these challenges, we introduce COOD, a novel zero-shot multi-label OOD detection framework. COOD leverages pre-trained vision-language models, enhancing them with a concept-based label expansion strategy and a new scoring function. By enriching the semantic space with both positive and negative concepts for each label, our approach models complex label dependencies, precisely differentiating OOD samples without the need for additional training. Extensive experiments demonstrate that our method significantly outperforms existing approaches, achieving approximately 95% average AUROC on both VOC and COCO datasets, while maintaining robust performance across varying numbers of labels and different types of OOD samples.
