Table of Contents
Fetching ...

SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation

Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, Onkar Dabeer

TL;DR

This work tackles the scarcity of labeled anomalies by introducing the Visual Anomaly (VisA) dataset and a novel SPot-the-Difference (SPD) regularization for self-supervised pre-training. SPD promotes local sensitivity to defects by using SmoothBlend-based local perturbations as negatives and weak global augmentations as positives within contrastive learning, and can also augment supervised pre-training with an auxiliary local-perturbation task. Across VisA and MVTec-AD, SPD consistently improves anomaly detection and segmentation performance, especially in 2-class high-shot and low-shot regimes, often closing the gap with supervised pre-training. The authors provide extensive ablations showing robust gains, analyze augmentation choices, and open-source the code for broader adoption in industrial defect detection workflows.

Abstract

Visual anomaly detection is commonly used in industrial quality inspection. In this paper, we present a new dataset as well as a new self-supervised learning method for ImageNet pre-training to improve anomaly detection and segmentation in 1-class and 2-class 5/10/high-shot training setups. We release the Visual Anomaly (VisA) Dataset consisting of 10,821 high-resolution color images (9,621 normal and 1,200 anomalous samples) covering 12 objects in 3 domains, making it the largest industrial anomaly detection dataset to date. Both image and pixel-level labels are provided. We also propose a new self-supervised framework - SPot-the-difference (SPD) - which can regularize contrastive self-supervised pre-training, such as SimSiam, MoCo and SimCLR, to be more suitable for anomaly detection tasks. Our experiments on VisA and MVTec-AD dataset show that SPD consistently improves these contrastive pre-training baselines and even the supervised pre-training. For example, SPD improves Area Under the Precision-Recall curve (AU-PR) for anomaly segmentation by 5.9% and 6.8% over SimSiam and supervised pre-training respectively in the 2-class high-shot regime. We open-source the project at http://github.com/amazon-research/spot-diff .

SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation

TL;DR

This work tackles the scarcity of labeled anomalies by introducing the Visual Anomaly (VisA) dataset and a novel SPot-the-Difference (SPD) regularization for self-supervised pre-training. SPD promotes local sensitivity to defects by using SmoothBlend-based local perturbations as negatives and weak global augmentations as positives within contrastive learning, and can also augment supervised pre-training with an auxiliary local-perturbation task. Across VisA and MVTec-AD, SPD consistently improves anomaly detection and segmentation performance, especially in 2-class high-shot and low-shot regimes, often closing the gap with supervised pre-training. The authors provide extensive ablations showing robust gains, analyze augmentation choices, and open-source the code for broader adoption in industrial defect detection workflows.

Abstract

Visual anomaly detection is commonly used in industrial quality inspection. In this paper, we present a new dataset as well as a new self-supervised learning method for ImageNet pre-training to improve anomaly detection and segmentation in 1-class and 2-class 5/10/high-shot training setups. We release the Visual Anomaly (VisA) Dataset consisting of 10,821 high-resolution color images (9,621 normal and 1,200 anomalous samples) covering 12 objects in 3 domains, making it the largest industrial anomaly detection dataset to date. Both image and pixel-level labels are provided. We also propose a new self-supervised framework - SPot-the-difference (SPD) - which can regularize contrastive self-supervised pre-training, such as SimSiam, MoCo and SimCLR, to be more suitable for anomaly detection tasks. Our experiments on VisA and MVTec-AD dataset show that SPD consistently improves these contrastive pre-training baselines and even the supervised pre-training. For example, SPD improves Area Under the Precision-Recall curve (AU-PR) for anomaly segmentation by 5.9% and 6.8% over SimSiam and supervised pre-training respectively in the 2-class high-shot regime. We open-source the project at http://github.com/amazon-research/spot-diff .
Paper Structure (19 sections, 7 equations, 12 figures, 14 tables)

This paper contains 19 sections, 7 equations, 12 figures, 14 tables.

Figures (12)

  • Figure 1: (a) Normal and anomalous samples of VisA - PCB1 with real defect (molten metal), anomaly highlighted by red ellipse; (b) A pair of images for the spot-the-difference (SPD) puzzle jhamtani2018learning; (c) An anchor image and its variant augmented by SmoothBlend for synthetic spot-the-difference; (d) GradCAM attention visualization for PCB1 - Anomaly image based on self-supervised ImageNet pre-training w/wo proposed SPD. With SPD, attention is more focused on the local defects.
  • Figure 2: (a) Contrastive learning in SimCLR, MoCo and SimSiam; (b) Contrastive learning in SPD training. Local deformation in SPD negative is highlighted by circle.
  • Figure 3: (a) Samples for synthetic spot-the-difference; (b) Augmentation comparison
  • Figure 4: The contrastive spot-the-difference learning
  • Figure 5: Samples of VisA datasets. First row: normal images; Second row: anomalous images; Third row: anomalies viewed by zooming in.
  • ...and 7 more figures