TagFog: Textual Anchor Guidance and Fake Outlier Generation for Visual Out-of-Distribution Detection
Jiankang Chen, Tong Zhang, Wei-Shi Zheng, Ruixuan Wang
TL;DR
TagFog presents a novel OOD detection framework that couples Jigsaw-based fake OOD generation with ChatGPT-derived textual anchors encoded by CLIP to train a vision encoder. By optimizing a joint objective that aligns image embeddings with rich textual anchors and applies SupCon-style constraints across ID and fake OOD samples, TagFog achieves state-of-the-art performance and remains compatible with post-hoc OOD scorers like ReAct. Extensive experiments on CIFAR-10/100 and ImageNet100 benchmarks, along with thorough ablations and sensitivity analyses, demonstrate robustness and the complementary value of textual guidance and fake OOD data. The approach offers a practical, flexible route to stronger OOD detection without requiring extra OOD labels, with clear potential for integration into real-world systems.
Abstract
Out-of-distribution (OOD) detection is crucial in many real-world applications. However, intelligent models are often trained solely on in-distribution (ID) data, leading to overconfidence when misclassifying OOD data as ID classes. In this study, we propose a new learning framework which leverage simple Jigsaw-based fake OOD data and rich semantic embeddings (`anchors') from the ChatGPT description of ID knowledge to help guide the training of the image encoder. The learning framework can be flexibly combined with existing post-hoc approaches to OOD detection, and extensive empirical evaluations on multiple OOD detection benchmarks demonstrate that rich textual representation of ID knowledge and fake OOD knowledge can well help train a visual encoder for OOD detection. With the learning framework, new state-of-the-art performance was achieved on all the benchmarks. The code is available at \url{https://github.com/Cverchen/TagFog}.
