Table of Contents
Fetching ...

Patch-wise Auto-Encoder for Visual Anomaly Detection

Yajie Cui, Zhaoxiang Liu, Shiguo Lian

TL;DR

The paper addresses unsupervised anomaly detection under limited anomaly samples by proposing Patch AE, a patch-wise auto-encoder that strengthens reconstruction sensitivity to defects through artificial defect augmentation and patch-level decoding. It learns a defect-sensitive, multi-scale feature representation via a pre-trained backbone and a one-to-one patch reconstruction scheme, and detects anomalies by nearest-neighbor distances in feature space, producing a patch-wise anomaly map and a final image score. The approach achieves state-of-the-art results on the MVTec AD benchmark, notably a single-model AUROC of $99.48\%$, while remaining computationally efficient compared to multi-model baselines. This work offers a practical and scalable solution for industrial defect detection with strong generalization to unseen anomalies.

Abstract

Anomaly detection without priors of the anomalies is challenging. In the field of unsupervised anomaly detection, traditional auto-encoder (AE) tends to fail based on the assumption that by training only on normal images, the model will not be able to reconstruct abnormal images correctly. On the contrary, we propose a novel patch-wise auto-encoder (Patch AE) framework, which aims at enhancing the reconstruction ability of AE to anomalies instead of weakening it. Each patch of image is reconstructed by corresponding spatially distributed feature vector of the learned feature representation, i.e., patch-wise reconstruction, which ensures anomaly-sensitivity of AE. Our method is simple and efficient. It advances the state-of-the-art performances on Mvtec AD benchmark, which proves the effectiveness of our model. It shows great potential in practical industrial application scenarios.

Patch-wise Auto-Encoder for Visual Anomaly Detection

TL;DR

The paper addresses unsupervised anomaly detection under limited anomaly samples by proposing Patch AE, a patch-wise auto-encoder that strengthens reconstruction sensitivity to defects through artificial defect augmentation and patch-level decoding. It learns a defect-sensitive, multi-scale feature representation via a pre-trained backbone and a one-to-one patch reconstruction scheme, and detects anomalies by nearest-neighbor distances in feature space, producing a patch-wise anomaly map and a final image score. The approach achieves state-of-the-art results on the MVTec AD benchmark, notably a single-model AUROC of , while remaining computationally efficient compared to multi-model baselines. This work offers a practical and scalable solution for industrial defect detection with strong generalization to unseen anomalies.

Abstract

Anomaly detection without priors of the anomalies is challenging. In the field of unsupervised anomaly detection, traditional auto-encoder (AE) tends to fail based on the assumption that by training only on normal images, the model will not be able to reconstruct abnormal images correctly. On the contrary, we propose a novel patch-wise auto-encoder (Patch AE) framework, which aims at enhancing the reconstruction ability of AE to anomalies instead of weakening it. Each patch of image is reconstructed by corresponding spatially distributed feature vector of the learned feature representation, i.e., patch-wise reconstruction, which ensures anomaly-sensitivity of AE. Our method is simple and efficient. It advances the state-of-the-art performances on Mvtec AD benchmark, which proves the effectiveness of our model. It shows great potential in practical industrial application scenarios.
Paper Structure (10 sections, 3 equations, 2 figures, 1 table)

This paper contains 10 sections, 3 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: The overview of our Patch-wise Auto-Encoder anomaly detection pipeline.
  • Figure 2: The detail of Patch AE structure, consisting of encoder and decoder, which are marked in red and pink boxes, respectively. $\bigotimes$ means performing data augmentation operation on the input image. $\bigoplus$ means feature concatenating after unifying dimension. Blue rectangles represents network layers initialized with pre-trained parameters. Green ones are two $1 \times 1$ convolution - ReLU activation layers of encoder. Yellow ones are $1 \times 1$ convolution, activation and $1 \times 1$ convolution layers of decoder. Purple annotations indicate the correspondence between feature vector and reconstructed image patch.