Table of Contents
Fetching ...

GL-MCM: Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection

Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa

TL;DR

Global-Local Maximum Concept Matching (GL-MCM), which incorporates local image scores as an auxiliary score to enhance the separability of global and local visual features, and outperforms baseline zero-shot methods and is comparable to fully supervised methods.

Abstract

Zero-shot out-of-distribution (OOD) detection is a task that detects OOD images during inference with only in-distribution (ID) class names. Existing methods assume ID images contain a single, centered object, and do not consider the more realistic multi-object scenarios, where both ID and OOD objects are present. To meet the needs of many users, the detection method must have the flexibility to adapt the type of ID images. To this end, we present Global-Local Maximum Concept Matching (GL-MCM), which incorporates local image scores as an auxiliary score to enhance the separability of global and local visual features. Due to the simple ensemble score function design, GL-MCM can control the type of ID images with a single weight parameter. Experiments on ImageNet and multi-object benchmarks demonstrate that GL-MCM outperforms baseline zero-shot methods and is comparable to fully supervised methods. Furthermore, GL-MCM offers strong flexibility in adjusting the target type of ID images. The code is available via https://github.com/AtsuMiyai/GL-MCM.

GL-MCM: Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection

TL;DR

Global-Local Maximum Concept Matching (GL-MCM), which incorporates local image scores as an auxiliary score to enhance the separability of global and local visual features, and outperforms baseline zero-shot methods and is comparable to fully supervised methods.

Abstract

Zero-shot out-of-distribution (OOD) detection is a task that detects OOD images during inference with only in-distribution (ID) class names. Existing methods assume ID images contain a single, centered object, and do not consider the more realistic multi-object scenarios, where both ID and OOD objects are present. To meet the needs of many users, the detection method must have the flexibility to adapt the type of ID images. To this end, we present Global-Local Maximum Concept Matching (GL-MCM), which incorporates local image scores as an auxiliary score to enhance the separability of global and local visual features. Due to the simple ensemble score function design, GL-MCM can control the type of ID images with a single weight parameter. Experiments on ImageNet and multi-object benchmarks demonstrate that GL-MCM outperforms baseline zero-shot methods and is comparable to fully supervised methods. Furthermore, GL-MCM offers strong flexibility in adjusting the target type of ID images. The code is available via https://github.com/AtsuMiyai/GL-MCM.
Paper Structure (20 sections, 6 equations, 6 figures, 10 tables)

This paper contains 20 sections, 6 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Multi-object images in ImageNet. ImageNet contains numerous images with multiple objects. It reflects the reality that many real-world photos do not feature a single ID object, and, instead, they often include OOD objects within the frame.
  • Figure 2: Overview of the Global-Local Maximum Concept Matching (GL-MCM) framework. Our approach utilizes both global and local softmax scores to calculate the ID confidence. By incorporating both global and local scores, our framework compensates for the respective weaknesses of the global and local alignments.
  • Figure 3: Ablation studies on $\lambda$ in Eq. (\ref{['eq:gl_mcm']}). Users can control the type of ID data they aim to detect by changing $\lambda$. A larger $\lambda$ for MS-COCO-like images with both ID and OOD objects and a smaller $\lambda$ for ImageNet-like images with dominant ID objects.
  • Figure 4: Comparison of the histograms of the scores. We use ImageNet (ID) and iNaturalist (OOD). We use CLIP-ViT-B/16. GL-MCM can identify the ID images, which either MCM or L-MCM can mistake. Furthermore, as shown in (c) and (d), changing $\lambda$ can easily control the score order.
  • Figure A: Visualization of the alignment maps of MCM and L-MCM.
  • ...and 1 more figures