Table of Contents
Fetching ...

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, Lizhuang Ma

TL;DR

PromptAD tackles one-class, few-shot anomaly detection by learning prompts from only normal samples. It introduces semantic concatenation to generate anomaly-promoting prompts and an explicit anomaly margin to enforce separation between normal and anomaly prompts, all within a VV-CLIP backbone framework. The method achieves top-tier image- and pixel-level AUROC across MVTec and VisA, with strong ablations confirming SC and EMA as essential components and insightful visualizations supporting its localization capability. Collectively, PromptAD offers a scalable, automated prompt-learning approach that significantly improves automation and performance in industrial anomaly detection scenarios.

Abstract

The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the above problem, this paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD. First, we propose semantic concatenation which can transpose normal prompts into anomaly prompts by concatenating normal prompts with anomaly suffixes, thus constructing a large number of negative samples used to guide prompt learning in one-class setting. Furthermore, to mitigate the training challenge caused by the absence of anomaly images, we introduce the concept of explicit anomaly margin, which is used to explicitly control the margin between normal prompt features and anomaly prompt features through a hyper-parameter. For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA.

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

TL;DR

PromptAD tackles one-class, few-shot anomaly detection by learning prompts from only normal samples. It introduces semantic concatenation to generate anomaly-promoting prompts and an explicit anomaly margin to enforce separation between normal and anomaly prompts, all within a VV-CLIP backbone framework. The method achieves top-tier image- and pixel-level AUROC across MVTec and VisA, with strong ablations confirming SC and EMA as essential components and insightful visualizations supporting its localization capability. Collectively, PromptAD offers a scalable, automated prompt-learning approach that significantly improves automation and performance in industrial anomaly detection scenarios.

Abstract

The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the above problem, this paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD. First, we propose semantic concatenation which can transpose normal prompts into anomaly prompts by concatenating normal prompts with anomaly suffixes, thus constructing a large number of negative samples used to guide prompt learning in one-class setting. Furthermore, to mitigate the training challenge caused by the absence of anomaly images, we introduce the concept of explicit anomaly margin, which is used to explicitly control the margin between normal prompt features and anomaly prompt features through a hyper-parameter. For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA.
Paper Structure (26 sections, 18 equations, 11 figures, 18 tables)

This paper contains 26 sections, 18 equations, 11 figures, 18 tables.

Figures (11)

  • Figure 1: Left: Prompt learning under many-class and one-class settings. Right: The prompt-guided results of WinCLIP using different numbers of prompts, and the prompt-guided results of the baseline and our PromptAD under one-shot for prompt learning. All results are on the MVTec.
  • Figure 2: Illustration of PromptAD, which includes two novel modules: SC and EAM. The visual encoder has been transformed with v-v attention. The original branch is used to extract CLS feature, while the v-v attention branch is used to extract the feature map.
  • Figure 3: Qualitative comparison results of 1-shot pixel-level anomaly detection on MVTec MvTec and VisA Visa.
  • Figure 4: Image-level/pixel-level results on VisA Visa in 1-shot setting using different $N$ and $L$.
  • Figure 5: Image-level/pixel-level results on MVTec MvTec in the 1-shot setting using different hyper-parameter $\lambda$.
  • ...and 6 more figures