Table of Contents
Fetching ...

ReConPatch : Contrastive Patch Representation Learning for Industrial Anomaly Detection

Jeeho Hyun, Sangyun Kim, Giyoung Jeon, Seung Hwan Kim, Kyunghoon Bae, Byung Jun Kang

TL;DR

This work tackles industrial anomaly detection by learning a target-oriented patch representation Space using ReConPatch, a two-network framework that applies relaxed contrastive learning with pairwise and contextual similarities as pseudo-labels. By modulating features from a pre-trained CNN through a lightweight representation head and stabilizing similarity computations with a slowly updated partner network, ReConPatch achieves state-of-the-art performance on the MVTec AD dataset (image AUROC up to 99.72% in ensembles) and strong results on BTAD (AUROC 95.8%), with notable segmentation gains. The method avoids heavy data augmentation and uses a coreset memory bank to enable efficient, scalable anomaly scoring. Overall, ReConPatch demonstrates robust, high-precision anomaly detection and localization in industrial settings, with potential for further enhancement via refinement-based localization improvements.

Abstract

Anomaly detection is crucial to the advanced identification of product defects such as incorrect parts, misaligned components, and damages in industrial manufacturing. Due to the rare observations and unknown types of defects, anomaly detection is considered to be challenging in machine learning. To overcome this difficulty, recent approaches utilize the common visual representations pre-trained from natural image datasets and distill the relevant features. However, existing approaches still have the discrepancy between the pre-trained feature and the target data, or require the input augmentation which should be carefully designed, particularly for the industrial dataset. In this paper, we introduce ReConPatch, which constructs discriminative features for anomaly detection by training a linear modulation of patch features extracted from the pre-trained model. ReConPatch employs contrastive representation learning to collect and distribute features in a way that produces a target-oriented and easily separable representation. To address the absence of labeled pairs for the contrastive learning, we utilize two similarity measures between data representations, pairwise and contextual similarities, as pseudo-labels. Our method achieves the state-of-the-art anomaly detection performance (99.72%) for the widely used and challenging MVTec AD dataset. Additionally, we achieved a state-of-the-art anomaly detection performance (95.8%) for the BTAD dataset.

ReConPatch : Contrastive Patch Representation Learning for Industrial Anomaly Detection

TL;DR

This work tackles industrial anomaly detection by learning a target-oriented patch representation Space using ReConPatch, a two-network framework that applies relaxed contrastive learning with pairwise and contextual similarities as pseudo-labels. By modulating features from a pre-trained CNN through a lightweight representation head and stabilizing similarity computations with a slowly updated partner network, ReConPatch achieves state-of-the-art performance on the MVTec AD dataset (image AUROC up to 99.72% in ensembles) and strong results on BTAD (AUROC 95.8%), with notable segmentation gains. The method avoids heavy data augmentation and uses a coreset memory bank to enable efficient, scalable anomaly scoring. Overall, ReConPatch demonstrates robust, high-precision anomaly detection and localization in industrial settings, with potential for further enhancement via refinement-based localization improvements.

Abstract

Anomaly detection is crucial to the advanced identification of product defects such as incorrect parts, misaligned components, and damages in industrial manufacturing. Due to the rare observations and unknown types of defects, anomaly detection is considered to be challenging in machine learning. To overcome this difficulty, recent approaches utilize the common visual representations pre-trained from natural image datasets and distill the relevant features. However, existing approaches still have the discrepancy between the pre-trained feature and the target data, or require the input augmentation which should be carefully designed, particularly for the industrial dataset. In this paper, we introduce ReConPatch, which constructs discriminative features for anomaly detection by training a linear modulation of patch features extracted from the pre-trained model. ReConPatch employs contrastive representation learning to collect and distribute features in a way that produces a target-oriented and easily separable representation. To address the absence of labeled pairs for the contrastive learning, we utilize two similarity measures between data representations, pairwise and contextual similarities, as pseudo-labels. Our method achieves the state-of-the-art anomaly detection performance (99.72%) for the widely used and challenging MVTec AD dataset. Additionally, we achieved a state-of-the-art anomaly detection performance (95.8%) for the BTAD dataset.
Paper Structure (28 sections, 12 equations, 10 figures, 15 tables)

This paper contains 28 sections, 12 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: Overall structure of the anomaly detection using ReConPatch. ReConPatch consists of two networks to train representations of the patch-level features, which includes the feature representation layer $f$, $\bar{f}$ and projection layer $g$, $\bar{g}$ respectively. Upper networks ($\bar{f}, \bar{g}$) are used to calculate pairwise and contextual similarities between patch-level feature pairs, while the bottom networks ($f, g$) used for the representation learning of patch-level features is trained through relaxed contrastive loss $\mathcal{L}_{RC}$.
  • Figure 2: Illustrative examples of similarity measures in the representation space. The pairwise similarity $\omega_{ij}^{Pairwise}$ between $\bar{z}_i$ and $\bar{z}_j$ is identical in both (a) and (b). In (a), the $k$-nearest neighbors $\mathcal{N}_k(i)$ and $\mathcal{N}_k(j)$ do not enclose each other. Therefore, $\omega_{ij}^{Contextual}$ has a lower value, and the $\bar{z}_i$ and $\bar{z}_j$ pair should become apart. By contrast, as $\mathcal{N}_k(i)$ and $\mathcal{N}_k(j)$ enclose each other in (b) case, $\omega_{ij}^{Contextual}$ takes a higher value, so that $\bar{z}_i$ and $\bar{z}_j$ pair should attract each other.
  • Figure 3: An illustrative comparison of features mapped by (a) PatchCore and (b) (c) (d) ReConPatch using the MVTec AD dataset. The scatter plot describes the feature space of each method, colored according to the pixel position.
  • Figure 4: The histogram of the anomaly score of the normal and abnormal data for the bottle class. ReConPatch shows high discriminability, as shown in $d'$ measure.
  • Figure 5: Examples of images with anomalies (top) and measured anomaly score maps (bottom) on MVTec AD dataset. The orange line depicts the ground truth of the anomalies and the green line depicts thresholds optimizing F1 scores of anomaly segmentation. The green star indicates the maximal location of the anomaly score in the heatmap.
  • ...and 5 more figures