Table of Contents
Fetching ...

MAEDAY: MAE for few and zero shot AnomalY-Detection

Eli Schwartz, Assaf Arbelle, Leonid Karlinsky, Sivan Harary, Florian Scheidegger, Sivan Doveh, Raja Giryes

TL;DR

MAEDAY tackles anomaly detection under zero- and few-shot regimes by repurposing a pre-trained MAE for image reconstruction. It leverages multiple random masks and reconstruction consistency to localize anomalies without training data (ZSAD) and, with limited normal samples, finetunes via LoRA to improve FSAD performance. The approach can be ensemble with PatchCore to achieve state-of-the-art results on MVTec-AD, and it extends to Zero-Shot Foreign Object Detection with competitive performance on texture-like surfaces. This work broadens practical AD capabilities for industrial inspection and texture analysis by combining pre-trained reconstruction with selective fine-tuning and ensemble strategies.

Abstract

We propose using Masked Auto-Encoder (MAE), a transformer model self-supervisedly trained on image inpainting, for anomaly detection (AD). Assuming anomalous regions are harder to reconstruct compared with normal regions. MAEDAY is the first image-reconstruction-based anomaly detection method that utilizes a pre-trained model, enabling its use for Few-Shot Anomaly Detection (FSAD). We also show the same method works surprisingly well for the novel tasks of Zero-Shot AD (ZSAD) and Zero-Shot Foreign Object Detection (ZSFOD), where no normal samples are available. Code is available at https://github.com/EliSchwartz/MAEDAY .

MAEDAY: MAE for few and zero shot AnomalY-Detection

TL;DR

MAEDAY tackles anomaly detection under zero- and few-shot regimes by repurposing a pre-trained MAE for image reconstruction. It leverages multiple random masks and reconstruction consistency to localize anomalies without training data (ZSAD) and, with limited normal samples, finetunes via LoRA to improve FSAD performance. The approach can be ensemble with PatchCore to achieve state-of-the-art results on MVTec-AD, and it extends to Zero-Shot Foreign Object Detection with competitive performance on texture-like surfaces. This work broadens practical AD capabilities for industrial inspection and texture analysis by combining pre-trained reconstruction with selective fine-tuning and ensemble strategies.

Abstract

We propose using Masked Auto-Encoder (MAE), a transformer model self-supervisedly trained on image inpainting, for anomaly detection (AD). Assuming anomalous regions are harder to reconstruct compared with normal regions. MAEDAY is the first image-reconstruction-based anomaly detection method that utilizes a pre-trained model, enabling its use for Few-Shot Anomaly Detection (FSAD). We also show the same method works surprisingly well for the novel tasks of Zero-Shot AD (ZSAD) and Zero-Shot Foreign Object Detection (ZSFOD), where no normal samples are available. Code is available at https://github.com/EliSchwartz/MAEDAY .
Paper Structure (9 sections, 1 equation, 6 figures, 5 tables)

This paper contains 9 sections, 1 equation, 6 figures, 5 tables.

Figures (6)

  • Figure 1: MAEDAY: We repurposed MAE for Zero and Few-Shot Anomaly-Detection. In the zero-shot setup, with no special training and no good images as a reference, ImageNet pre-trained MAE is used to reconstruct a mostly masked-out query image. Anomalous regions are detected in areas where the reconstruction fails, as these regions cannot be accurately inferred from neighboring regions. The anomaly scores are averaged across multiple reconstructions with different random masks. In the few-shot case, the pre-trained model is further finetuned on the reconstruction of the available normal images. Figure adapted from he2022masked.
  • Figure 2: ROC-AUC for 0-4 shot on the MVTec dataset.
  • Figure 3: Number of repetitions per image. Scores for each image are averaged over multiple reconstructions with different random masks. We observe performance saturation at $\sim32$ repetitions.
  • Figure 4: 1-shot examples from the MVTec dataset. Usually the anomaly is detected but the predicted anomaly area tends to be smaller than the ground-truth, hence the pixel-level ROC-AUC is smaller than the image-level.
  • Figure 5: Examples of reconstruction for both normal and anomalous images from the MVTech dataset. The model is usually able to recover (a blurry version of) the normal images. In many cases this is enough for detecting anomalous regions.
  • ...and 1 more figures