AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP
Wenxin Ma, Xu Zhang, Qingsong Yao, Fenghe Tang, Chenxu Wu, Yingtai Li, Rui Yan, Zihang Jiang, S. Kevin Zhou
TL;DR
This work tackles zero-shot anomaly detection by addressing CLIP's Anomaly Unawareness in the text space. It introduces AA-CLIP, a two-stage adaptation using Residual Adapters to create anomaly-aware text anchors and to align patch-level visuals to these anchors, all while keeping the CLIP backbone frozen to preserve generalization. A Disentanglement Loss enforces independence between normal and anomaly anchors, enabling robust generalization to unseen classes, and multi-granularity patch features are used for precise localization. Empirically, AA-CLIP achieves state-of-the-art results on industrial and medical AD benchmarks with limited data (e.g., 2-shot) and remains competitive or superior with larger data, demonstrating efficient, scalable anomaly detection and localization.
Abstract
Anomaly detection (AD) identifies outliers for applications like defect and lesion detection. While CLIP shows promise for zero-shot AD tasks due to its strong generalization capabilities, its inherent Anomaly-Unawareness leads to limited discrimination between normal and abnormal features. To address this problem, we propose Anomaly-Aware CLIP (AA-CLIP), which enhances CLIP's anomaly discrimination ability in both text and visual spaces while preserving its generalization capability. AA-CLIP is achieved through a straightforward yet effective two-stage approach: it first creates anomaly-aware text anchors to differentiate normal and abnormal semantics clearly, then aligns patch-level visual features with these anchors for precise anomaly localization. This two-stage strategy, with the help of residual adapters, gradually adapts CLIP in a controlled manner, achieving effective AD while maintaining CLIP's class knowledge. Extensive experiments validate AA-CLIP as a resource-efficient solution for zero-shot AD tasks, achieving state-of-the-art results in industrial and medical applications. The code is available at https://github.com/Mwxinnn/AA-CLIP.
