AnoPLe: Few-Shot Anomaly Detection via Bi-directional Prompt Learning with Only Normal Samples
Yujin Lee, Seoyoon Jang, Hyunsoo Yoon
TL;DR
AnoPLe addresses few-shot anomaly detection without access to true anomalies by introducing bidirectional, learnable prompts that couple textual and visual modalities within CLIP, along with a lightweight multi-view decoder and a memory-guided localization mechanism. It simulates anomalies in both pixel and latent spaces and trains with losses that align local (pixel-level) and global (image-level) semantics, achieving strong image- and pixel-level AUROCs on MVTec-AD ($I$-AUROC) and VisA benchmarks (e.g., 94.1% on MVTec-AD and 86.2% on VisA in 1-shot) while avoiding true anomaly data. Across ablations and prompt-guided evaluations, AnoPLe consistently outperforms non-anomaly-aware baselines and remains competitive with state-of-the-art methods that use true anomalies, demonstrating robust performance under 1-, 2-, and 4-shot regimes. The work shows practical impact by enabling reliable anomaly detection with only normal samples, reducing data requirements and enabling scalable deployment in industrial inspection scenarios.
Abstract
Few-shot Anomaly Detection (FAD) poses significant challenges due to the limited availability of training samples and the frequent absence of abnormal samples. Previous approaches often rely on annotations or true abnormal samples to improve detection, but such textual or visual cues are not always accessible. To address this, we introduce AnoPLe, a multi-modal prompt learning method designed for anomaly detection without prior knowledge of anomalies. AnoPLe simulates anomalies and employs bidirectional coupling of textual and visual prompts to facilitate deep interaction between the two modalities. Additionally, we integrate a lightweight decoder with a learnable multi-view signal, trained on multi-scale images to enhance local semantic comprehension. To further improve performance, we align global and local semantics, enriching the image-level understanding of anomalies. The experimental results demonstrate that AnoPLe achieves strong FAD performance, recording 94.1% and 86.2% Image AUROC on MVTec-AD and VisA respectively, with only around a 1% gap compared to the SoTA, despite not being exposed to true anomalies. Code is available at https://github.com/YoojLee/AnoPLe.
