Concept Matching with Agent for Out-of-Distribution Detection
Yuxiao Lee, Xiaofeng Cao, Jingcai Guo, Wei Ye, Qing Guo, Yi Chang
TL;DR
This work tackles the challenge of robust OOD detection by moving beyond binary in/out boundaries and introducing a zero-shot, agent-based framework. Concept Matching with Agent (CMA) leverages neutral textual prompts as Agents to create a vector triangle relationship among ID labels, Agents, and data inputs within a CLIP-based vision–language space, enabling robust separation without additional training. Empirical results across large-scale and small-scale benchmarks show CMA outperforms both zero-shot and training-dependent baselines on AUROC and FPR95, and analyses reveal optimal agent counts and agent-specific effects. The approach demonstrates strong practical impact by offering a scalable, training-free method that adapts to diverse scenarios with potential for tailored Agent design.
Abstract
The remarkable achievements of Large Language Models (LLMs) have captivated the attention of both academia and industry, transcending their initial role in dialogue generation. To expand the usage scenarios of LLM, some works enhance the effectiveness and capabilities of the model by introducing more external information, which is called the agent paradigm. Based on this idea, we propose a new method that integrates the agent paradigm into out-of-distribution (OOD) detection task, aiming to improve its robustness and adaptability. Our proposed method, Concept Matching with Agent (CMA), employs neutral prompts as agents to augment the CLIP-based OOD detection process. These agents function as dynamic observers and communication hubs, interacting with both In-distribution (ID) labels and data inputs to form vector triangle relationships. This triangular framework offers a more nuanced approach than the traditional binary relationship, allowing for better separation and identification of ID and OOD inputs. Our extensive experimental results showcase the superior performance of CMA over both zero-shot and training-required methods in a diverse array of real-world scenarios.
