Table of Contents
Fetching ...

MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning

Ylli Sadikaj, Hongkuan Zhou, Lavdim Halilaj, Stefan Schmid, Steffen Staab, Claudia Plant

TL;DR

MultiADS tackles the limitation of binary anomaly detection by introducing defect-aware, multi-type anomaly segmentation in zero-shot and few-shot settings. By integrating a Knowledge Base for Anomalies (KBA) with defect-aware text prompts, and aligning them to image patches via lightweight adapters in a CLIP backbone, it produces per-pixel defect-type maps across multiple datasets. It introduces MTAS as a formal task and compares two variants, MultiADS and MultiADS-F, the latter filtering out product-irrelevant defect types to reduce noise. Across MVTec-AD, VisA, MPDD, MAD, and Real-IAD, MultiADS achieves state-of-the-art or competitive results in both MTAS and binary anomaly detection/segmentation, demonstrating strong generalization to unseen defects and effectiveness in zero-shot and few-shot regimes. This defect-aware supervision has practical impact for automated defect remediation in diverse production lines, enabling rapid diagnosis of specific defect types and targeted corrective actions.

Abstract

Precise optical inspection in industrial applications is crucial for minimizing scrap rates and reducing the associated costs. Besides merely detecting if a product is anomalous or not, it is crucial to know the distinct type of defect, such as a bent, cut, or scratch. The ability to recognize the "exact" defect type enables automated treatments of the anomalies in modern production lines. Current methods are limited to solely detecting whether a product is defective or not without providing any insights on the defect type, nevertheless detecting and identifying multiple defects. We propose MultiADS, a zero-shot learning approach, able to perform Multi-type Anomaly Detection and Segmentation. The architecture of MultiADS comprises CLIP and extra linear layers to align the visual- and textual representation in a joint feature space. To the best of our knowledge, our proposal, is the first approach to perform a multi-type anomaly segmentation task in zero-shot learning. Contrary to the other baselines, our approach i) generates specific anomaly masks for each distinct defect type, ii) learns to distinguish defect types, and iii) simultaneously identifies multiple defect types present in an anomalous product. Additionally, our approach outperforms zero/few-shot learning SoTA methods on image-level and pixel-level anomaly detection and segmentation tasks on five commonly used datasets: MVTec-AD, Visa, MPDD, MAD and Real-IAD.

MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning

TL;DR

MultiADS tackles the limitation of binary anomaly detection by introducing defect-aware, multi-type anomaly segmentation in zero-shot and few-shot settings. By integrating a Knowledge Base for Anomalies (KBA) with defect-aware text prompts, and aligning them to image patches via lightweight adapters in a CLIP backbone, it produces per-pixel defect-type maps across multiple datasets. It introduces MTAS as a formal task and compares two variants, MultiADS and MultiADS-F, the latter filtering out product-irrelevant defect types to reduce noise. Across MVTec-AD, VisA, MPDD, MAD, and Real-IAD, MultiADS achieves state-of-the-art or competitive results in both MTAS and binary anomaly detection/segmentation, demonstrating strong generalization to unseen defects and effectiveness in zero-shot and few-shot regimes. This defect-aware supervision has practical impact for automated defect remediation in diverse production lines, enabling rapid diagnosis of specific defect types and targeted corrective actions.

Abstract

Precise optical inspection in industrial applications is crucial for minimizing scrap rates and reducing the associated costs. Besides merely detecting if a product is anomalous or not, it is crucial to know the distinct type of defect, such as a bent, cut, or scratch. The ability to recognize the "exact" defect type enables automated treatments of the anomalies in modern production lines. Current methods are limited to solely detecting whether a product is defective or not without providing any insights on the defect type, nevertheless detecting and identifying multiple defects. We propose MultiADS, a zero-shot learning approach, able to perform Multi-type Anomaly Detection and Segmentation. The architecture of MultiADS comprises CLIP and extra linear layers to align the visual- and textual representation in a joint feature space. To the best of our knowledge, our proposal, is the first approach to perform a multi-type anomaly segmentation task in zero-shot learning. Contrary to the other baselines, our approach i) generates specific anomaly masks for each distinct defect type, ii) learns to distinguish defect types, and iii) simultaneously identifies multiple defect types present in an anomalous product. Additionally, our approach outperforms zero/few-shot learning SoTA methods on image-level and pixel-level anomaly detection and segmentation tasks on five commonly used datasets: MVTec-AD, Visa, MPDD, MAD and Real-IAD.

Paper Structure

This paper contains 39 sections, 9 equations, 15 figures, 33 tables.

Figures (15)

  • Figure 1: Comparison of common approaches and our approach: a) Common approaches typically differentiate only between normal and abnormal states; whereas b) our approach identifies $K+1$ states: one normal state and $K$ distinct abnormal states corresponding to different defect types. This allows our method to distinguish between various defect types.
  • Figure 2: Visualization of text prompts (TP) embeddings of common approaches and ours for Bracket Brown product of the MPDD dataset utilizing visualization tool t-SNE tsne. Dot signs ($\cdot)$ represent TP embeddings, plus signs ($+$) represent the average embedding of TPs with the same color.
  • Figure 3: Training phase:$K_1$ text prompts describing the defect types plus one for good products are encoded into $K_1+1$ averaged text embeddings. The image patches are encoded and compared to these embeddings to produce $K_1+1$ similarity maps. For multi-type anomaly segmentation, we use dice and focal loss. Inference phase: we construct $K_2+1$ sets of text prompts. For anomaly segmentation (AS), we up-sample the complement of the normal layer’s similarity map. For anomaly detection (AD), the global anomaly score $a_x$ and the maximum score from the anomaly map are utilized. In few-shot testing, the query image is then compared with multiple reference (normal) images in the testing dataset to generate a similarity map. This similarity map is finally up-sampled and combined with the anomaly map for segmentation and classification tasks.
  • Figure 4: MultiADS locates and identifies simultaneously multi-type anomalies on cashew (a) and candle (b) products.
  • Figure 5: Few-Shot Image level (AUROC) accuracy for different k-shots on the VisA and MVTec-AD datasets. (* - results taken from papers, AGPT - AnomalyGPT, PCore - PatchCore, PrAD - PromptAD, ApGAN - April-GAN)
  • ...and 10 more figures