Table of Contents
Fetching ...

OT-DETECTOR: Delving into Optimal Transport for Zero-shot Out-of-Distribution Detection

Yu Liu, Hao Tang, Haiqi Zhang, Jing Qin, Zechao Li

TL;DR

This work tackles zero-shot OOD detection by leveraging CLIP for cross-modal representations and introducing OT-detector, which quantifies both semantic and distributional discrepancies via OT. A key novelty is the Semantic-aware Content Refinement (SaCR), which refines content through multi-view guidance to magnify distributional gaps between ID and hard OOD samples. The method defines two OT-based scores—semantic-wise and distribution-wise—and fuses them into a final score S_OT, achieving state-of-the-art performance on ImageNet-1K OOD benchmarks and excelling in hard-OOD scenarios. The results demonstrate strong practical impact for reliable open-world recognition without requiring ID training data or external OOD labels, albeit with batch-size sensitivity due to OT computation.

Abstract

Out-of-distribution (OOD) detection is crucial for ensuring the reliability and safety of machine learning models in real-world applications. While zero-shot OOD detection, which requires no training on in-distribution (ID) data, has become feasible with the emergence of vision-language models like CLIP, existing methods primarily focus on semantic matching and fail to fully capture distributional discrepancies. To address these limitations, we propose OT-DETECTOR, a novel framework that employs Optimal Transport (OT) to quantify both semantic and distributional discrepancies between test samples and ID labels. Specifically, we introduce cross-modal transport mass and transport cost as semantic-wise and distribution-wise OOD scores, respectively, enabling more robust detection of OOD samples. Additionally, we present a semantic-aware content refinement (SaCR) module, which utilizes semantic cues from ID labels to amplify the distributional discrepancy between ID and hard OOD samples. Extensive experiments on several benchmarks demonstrate that OT-DETECTOR achieves state-of-the-art performance across various OOD detection tasks, particularly in challenging hard-OOD scenarios.

OT-DETECTOR: Delving into Optimal Transport for Zero-shot Out-of-Distribution Detection

TL;DR

This work tackles zero-shot OOD detection by leveraging CLIP for cross-modal representations and introducing OT-detector, which quantifies both semantic and distributional discrepancies via OT. A key novelty is the Semantic-aware Content Refinement (SaCR), which refines content through multi-view guidance to magnify distributional gaps between ID and hard OOD samples. The method defines two OT-based scores—semantic-wise and distribution-wise—and fuses them into a final score S_OT, achieving state-of-the-art performance on ImageNet-1K OOD benchmarks and excelling in hard-OOD scenarios. The results demonstrate strong practical impact for reliable open-world recognition without requiring ID training data or external OOD labels, albeit with batch-size sensitivity due to OT computation.

Abstract

Out-of-distribution (OOD) detection is crucial for ensuring the reliability and safety of machine learning models in real-world applications. While zero-shot OOD detection, which requires no training on in-distribution (ID) data, has become feasible with the emergence of vision-language models like CLIP, existing methods primarily focus on semantic matching and fail to fully capture distributional discrepancies. To address these limitations, we propose OT-DETECTOR, a novel framework that employs Optimal Transport (OT) to quantify both semantic and distributional discrepancies between test samples and ID labels. Specifically, we introduce cross-modal transport mass and transport cost as semantic-wise and distribution-wise OOD scores, respectively, enabling more robust detection of OOD samples. Additionally, we present a semantic-aware content refinement (SaCR) module, which utilizes semantic cues from ID labels to amplify the distributional discrepancy between ID and hard OOD samples. Extensive experiments on several benchmarks demonstrate that OT-DETECTOR achieves state-of-the-art performance across various OOD detection tasks, particularly in challenging hard-OOD scenarios.

Paper Structure

This paper contains 54 sections, 17 equations, 8 figures, 13 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of diverse views containing distinct semantic information. The value beneath each view represents its cosine similarity to the text prompt "a photo of a Jay".
  • Figure 2: Pipeline of the Semantic-aware Content Refinement (SaCR) module.
  • Figure 3: Pipeline of our Optimal Transport-based framework OT-detector for zero-shot OOD detection.
  • Figure 4: Analyses on the hyper-parameter of threshold $k$, where results are reported with ImageNet-1K benchmark.
  • Figure 5: Visualization of SaCR for ID/Hard OOD sample pairs: (a) Test Images; (b) Views selected with largest margin; (c) Views filtered with lower margin.
  • ...and 3 more figures