Table of Contents
Fetching ...

ToCoAD: Two-Stage Contrastive Learning for Industrial Anomaly Detection

Yun Liang, Zhiguang Hu, Junjie Huang, Donglin Di, Anyang Su, Lei Fan

TL;DR

This work tackles the domain gap between pre-trained feature extractors and industrial anomaly data in unsupervised anomaly detection. It introduces ToCoAD, a two-stage training framework where a discriminative network is first trained with synthetic anomalies to coarse-locates defects, and then jointly fine-tunes the feature extractor via negative-guided bootstrap contrastive learning guided by the discriminative network, complemented by a memory-bank-based localization mechanism. Empirical results across MVTec AD, VisA, and BTAD demonstrate competitive pixel-level and image-level AUROC, with particular strength when using Perlin-noise–generated anomalies and a SimSiam-based contrastive loss. Ablation analyses confirm the importance of the two-stage design, the choice of anomaly generator, and the role of memory-coreset memory in robust industrial anomaly localization.

Abstract

Current unsupervised anomaly detection approaches perform well on public datasets but struggle with specific anomaly types due to the domain gap between pre-trained feature extractors and target-specific domains. To tackle this issue, this paper presents a two-stage training strategy, called \textbf{ToCoAD}. In the first stage, a discriminative network is trained by using synthetic anomalies in a self-supervised learning manner. This network is then utilized in the second stage to provide a negative feature guide, aiding in the training of the feature extractor through bootstrap contrastive learning. This approach enables the model to progressively learn the distribution of anomalies specific to industrial datasets, effectively enhancing its generalizability to various types of anomalies. Extensive experiments are conducted to demonstrate the effectiveness of our proposed two-stage training strategy, and our model produces competitive performance, achieving pixel-level AUROC scores of 98.21\%, 98.43\% and 97.70\% on MVTec AD, VisA and BTAD respectively.

ToCoAD: Two-Stage Contrastive Learning for Industrial Anomaly Detection

TL;DR

This work tackles the domain gap between pre-trained feature extractors and industrial anomaly data in unsupervised anomaly detection. It introduces ToCoAD, a two-stage training framework where a discriminative network is first trained with synthetic anomalies to coarse-locates defects, and then jointly fine-tunes the feature extractor via negative-guided bootstrap contrastive learning guided by the discriminative network, complemented by a memory-bank-based localization mechanism. Empirical results across MVTec AD, VisA, and BTAD demonstrate competitive pixel-level and image-level AUROC, with particular strength when using Perlin-noise–generated anomalies and a SimSiam-based contrastive loss. Ablation analyses confirm the importance of the two-stage design, the choice of anomaly generator, and the role of memory-coreset memory in robust industrial anomaly localization.

Abstract

Current unsupervised anomaly detection approaches perform well on public datasets but struggle with specific anomaly types due to the domain gap between pre-trained feature extractors and target-specific domains. To tackle this issue, this paper presents a two-stage training strategy, called \textbf{ToCoAD}. In the first stage, a discriminative network is trained by using synthetic anomalies in a self-supervised learning manner. This network is then utilized in the second stage to provide a negative feature guide, aiding in the training of the feature extractor through bootstrap contrastive learning. This approach enables the model to progressively learn the distribution of anomalies specific to industrial datasets, effectively enhancing its generalizability to various types of anomalies. Extensive experiments are conducted to demonstrate the effectiveness of our proposed two-stage training strategy, and our model produces competitive performance, achieving pixel-level AUROC scores of 98.21\%, 98.43\% and 97.70\% on MVTec AD, VisA and BTAD respectively.
Paper Structure (19 sections, 9 equations, 7 figures, 9 tables)

This paper contains 19 sections, 9 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Existing methods rely on frozen pre-trained feature extractors, which can lead to inaccuracies in anomaly detection. In contrast, our method utilizes a two-stage training strategy to fine-tune the feature extractor under the contrastive learning paradigm.
  • Figure 2: Overview of our two-stage training strategy, ToCoAD. First, a synthetic anomaly image is generated by anomaly generator $\mathbf{G}$ to train a discriminative network $\mathbf{D}$. Then, the feature extractor $\mathbf{F}$ is fune-tuned jointly by contrastive learning network $\mathbf{C}$ and pre-trained discriminative networks $\mathbf{D}$ using synthetic anomaly images and augmented image pairs.
  • Figure 3: Overview of anomaly generator $\mathbf{G}$ with Perlin noise. Firstly, the normal image $I$ undergoes random slight angle and 90-degree rotation to obtain $I_R$. Secondly, the Perlin noise $P$ and the texture sample $A$ are subjected to a bitwise-and operation to generate the anomalous region $A_P$. Finally, the anomalous region $A_P$ and the rotated image $I_R$ are fused to obtain the synthetic anomaly sample $I_\mathbf{G}$.
  • Figure 4: Detailed structure of contrastive learning network $\mathbf{C}$. A normal image $I$ is augmented to obtain two views $v^1,v^2$, which are then passed through the feature extractor and a contrastive learning network to obtain the projected features $z^1,z^2$ and predicted features $p^1,p^2$. The $L_{cossim}$ is calculated for two branches, one of which has no predictor branch applying the stop-gradient operation.
  • Figure 5: Pipeline of modeling memory bank. The fine-tuned feature extractor is used to extract the adapted patch features from normal samples, and then these features are stored in the memory bank through coreset subsampling. During the inference phase, anomalies are detected by calculating the Euclidean distance between the adapted features from the test image and the nearest neighbor coresets.
  • ...and 2 more figures