Table of Contents
Fetching ...

AnomalySD: Few-Shot Multi-Class Anomaly Detection with Stable Diffusion Model

Zhenyu Yan, Qingqing Fang, Wenxi Lv, Qinliang Su

TL;DR

AnomalySD tackles the challenge of few-shot, multi-class industrial anomaly detection by recasting inpainting as a localization signal. It adapts Stable Diffusion through targeted fine-tuning of the denoising network and VAE decoder, guided by foreground masks and hierarchical text prompts, to inpaint anomalous regions into normal contexts. At inference, it deploys multi-scale masks and prototype-guided masks to cover diverse anomalies and computes a fused anomaly score map $S_{map}$ from the inpainted results, achieving high image- and pixel-level AUROCs on MVTec-AD and VisA in both few-shot and many-shot regimes. The approach demonstrates competitive performance against strong baselines, with significant gains from module-level ablations and mask/prompt design choices, highlighting a promising direction for diffusion-model-based, flexible, multi-class anomaly detection in industry.

Abstract

Anomaly detection is a critical task in industrial manufacturing, aiming to identify defective parts of products. Most industrial anomaly detection methods assume the availability of sufficient normal data for training. This assumption may not hold true due to the cost of labeling or data privacy policies. Additionally, mainstream methods require training bespoke models for different objects, which incurs heavy costs and lacks flexibility in practice. To address these issues, we seek help from Stable Diffusion (SD) model due to its capability of zero/few-shot inpainting, which can be leveraged to inpaint anomalous regions as normal. In this paper, a few-shot multi-class anomaly detection framework that adopts Stable Diffusion model is proposed, named AnomalySD. To adapt SD to anomaly detection task, we design different hierarchical text descriptions and the foreground mask mechanism for fine-tuning SD. In the inference stage, to accurately mask anomalous regions for inpainting, we propose multi-scale mask strategy and prototype-guided mask strategy to handle diverse anomalous regions. Hierarchical text prompts are also utilized to guide the process of inpainting in the inference stage. The anomaly score is estimated based on inpainting result of all masks. Extensive experiments on the MVTec-AD and VisA datasets demonstrate the superiority of our approach. We achieved anomaly classification and segmentation results of 93.6%/94.8% AUROC on the MVTec-AD dataset and 86.1%/96.5% AUROC on the VisA dataset under multi-class and one-shot settings.

AnomalySD: Few-Shot Multi-Class Anomaly Detection with Stable Diffusion Model

TL;DR

AnomalySD tackles the challenge of few-shot, multi-class industrial anomaly detection by recasting inpainting as a localization signal. It adapts Stable Diffusion through targeted fine-tuning of the denoising network and VAE decoder, guided by foreground masks and hierarchical text prompts, to inpaint anomalous regions into normal contexts. At inference, it deploys multi-scale masks and prototype-guided masks to cover diverse anomalies and computes a fused anomaly score map from the inpainted results, achieving high image- and pixel-level AUROCs on MVTec-AD and VisA in both few-shot and many-shot regimes. The approach demonstrates competitive performance against strong baselines, with significant gains from module-level ablations and mask/prompt design choices, highlighting a promising direction for diffusion-model-based, flexible, multi-class anomaly detection in industry.

Abstract

Anomaly detection is a critical task in industrial manufacturing, aiming to identify defective parts of products. Most industrial anomaly detection methods assume the availability of sufficient normal data for training. This assumption may not hold true due to the cost of labeling or data privacy policies. Additionally, mainstream methods require training bespoke models for different objects, which incurs heavy costs and lacks flexibility in practice. To address these issues, we seek help from Stable Diffusion (SD) model due to its capability of zero/few-shot inpainting, which can be leveraged to inpaint anomalous regions as normal. In this paper, a few-shot multi-class anomaly detection framework that adopts Stable Diffusion model is proposed, named AnomalySD. To adapt SD to anomaly detection task, we design different hierarchical text descriptions and the foreground mask mechanism for fine-tuning SD. In the inference stage, to accurately mask anomalous regions for inpainting, we propose multi-scale mask strategy and prototype-guided mask strategy to handle diverse anomalous regions. Hierarchical text prompts are also utilized to guide the process of inpainting in the inference stage. The anomaly score is estimated based on inpainting result of all masks. Extensive experiments on the MVTec-AD and VisA datasets demonstrate the superiority of our approach. We achieved anomaly classification and segmentation results of 93.6%/94.8% AUROC on the MVTec-AD dataset and 86.1%/96.5% AUROC on the VisA dataset under multi-class and one-shot settings.
Paper Structure (34 sections, 21 equations, 4 figures, 5 tables)

This paper contains 34 sections, 21 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Difference between (a) vanilla unsupervised anomaly detection and our (b) few-shot anomaly detection under multi-class setting.
  • Figure 2: The framework structure of AnomalySD is as follows: 1) Fine-tuning stage: A stable diffusion denoising network is fine-tuned for inpainting on few-shots normal dataset. 2) Inference stage: Multi-scale Masking Module and prototype-guided Masking Module are utilized to mask potential anomalous regions for inpainting these areas into normal ones. A fusion averaging of error maps from different masks yields the final anomaly score $\mathcal{S}_{map}$.
  • Figure 3: Comparison of the reconstruction results and location results between our 1-shot method with full-shot methods DRAEM draem and EdgRec EdgRec.
  • Figure 4: Ablations of noise strength $\lambda$ in inference stage.