OT-DETECTOR: Delving into Optimal Transport for Zero-shot Out-of-Distribution Detection
Yu Liu, Hao Tang, Haiqi Zhang, Jing Qin, Zechao Li
TL;DR
This work tackles zero-shot OOD detection by leveraging CLIP for cross-modal representations and introducing OT-detector, which quantifies both semantic and distributional discrepancies via OT. A key novelty is the Semantic-aware Content Refinement (SaCR), which refines content through multi-view guidance to magnify distributional gaps between ID and hard OOD samples. The method defines two OT-based scores—semantic-wise and distribution-wise—and fuses them into a final score S_OT, achieving state-of-the-art performance on ImageNet-1K OOD benchmarks and excelling in hard-OOD scenarios. The results demonstrate strong practical impact for reliable open-world recognition without requiring ID training data or external OOD labels, albeit with batch-size sensitivity due to OT computation.
Abstract
Out-of-distribution (OOD) detection is crucial for ensuring the reliability and safety of machine learning models in real-world applications. While zero-shot OOD detection, which requires no training on in-distribution (ID) data, has become feasible with the emergence of vision-language models like CLIP, existing methods primarily focus on semantic matching and fail to fully capture distributional discrepancies. To address these limitations, we propose OT-DETECTOR, a novel framework that employs Optimal Transport (OT) to quantify both semantic and distributional discrepancies between test samples and ID labels. Specifically, we introduce cross-modal transport mass and transport cost as semantic-wise and distribution-wise OOD scores, respectively, enabling more robust detection of OOD samples. Additionally, we present a semantic-aware content refinement (SaCR) module, which utilizes semantic cues from ID labels to amplify the distributional discrepancy between ID and hard OOD samples. Extensive experiments on several benchmarks demonstrate that OT-DETECTOR achieves state-of-the-art performance across various OOD detection tasks, particularly in challenging hard-OOD scenarios.
