Table of Contents
Fetching ...

AMI-Net: Adaptive Mask Inpainting Network for Industrial Anomaly Detection and Localization

Wei Luo, Haiming Yao, Wenyong Yu, Zhengyong Li

TL;DR

AMI-Net tackles unsupervised industrial anomaly detection by reframing reconstruction as adaptive mask inpainting guided by multi-scale semantic features. It introduces a random positional and quantitative masking strategy during training and an adaptive mask generator for inference, combined with a ViT-based inpainting network and a clustering-driven token framework to suppress defect reconstruction and enable precise localization. Across MVTec AD and BTAD, AMI-Net achieves strong localization and competitive image-level detection while delivering real-time performance, with ablations guiding design choices such as backbone, patch size, and clustering parameters. The approach demonstrates robust cross-dataset and few-shot capabilities, and the authors discuss extensions to scenarios with partial abnormal data and future improvements for subtle defects.

Abstract

Unsupervised visual anomaly detection is crucial for enhancing industrial production quality and efficiency. Among unsupervised methods, reconstruction approaches are popular due to their simplicity and effectiveness. The key aspect of reconstruction methods lies in the restoration of anomalous regions, which current methods have not satisfactorily achieved. To tackle this issue, we introduce a novel \uline{A}daptive \uline{M}ask \uline{I}npainting \uline{Net}work (AMI-Net) from the perspective of adaptive mask-inpainting. In contrast to traditional reconstruction methods that treat non-semantic image pixels as targets, our method uses a pre-trained network to extract multi-scale semantic features as reconstruction targets. Given the multiscale nature of industrial defects, we incorporate a training strategy involving random positional and quantitative masking. Moreover, we propose an innovative adaptive mask generator capable of generating adaptive masks that effectively mask anomalous regions while preserving normal regions. In this manner, the model can leverage the visible normal global contextual information to restore the masked anomalous regions, thereby effectively suppressing the reconstruction of defects. Extensive experimental results on the MVTec AD and BTAD industrial datasets validate the effectiveness of the proposed method. Additionally, AMI-Net exhibits exceptional real-time performance, striking a favorable balance between detection accuracy and speed, rendering it highly suitable for industrial applications. Code is available at: https://github.com/luow23/AMI-Net

AMI-Net: Adaptive Mask Inpainting Network for Industrial Anomaly Detection and Localization

TL;DR

AMI-Net tackles unsupervised industrial anomaly detection by reframing reconstruction as adaptive mask inpainting guided by multi-scale semantic features. It introduces a random positional and quantitative masking strategy during training and an adaptive mask generator for inference, combined with a ViT-based inpainting network and a clustering-driven token framework to suppress defect reconstruction and enable precise localization. Across MVTec AD and BTAD, AMI-Net achieves strong localization and competitive image-level detection while delivering real-time performance, with ablations guiding design choices such as backbone, patch size, and clustering parameters. The approach demonstrates robust cross-dataset and few-shot capabilities, and the authors discuss extensions to scenarios with partial abnormal data and future improvements for subtle defects.

Abstract

Unsupervised visual anomaly detection is crucial for enhancing industrial production quality and efficiency. Among unsupervised methods, reconstruction approaches are popular due to their simplicity and effectiveness. The key aspect of reconstruction methods lies in the restoration of anomalous regions, which current methods have not satisfactorily achieved. To tackle this issue, we introduce a novel \uline{A}daptive \uline{M}ask \uline{I}npainting \uline{Net}work (AMI-Net) from the perspective of adaptive mask-inpainting. In contrast to traditional reconstruction methods that treat non-semantic image pixels as targets, our method uses a pre-trained network to extract multi-scale semantic features as reconstruction targets. Given the multiscale nature of industrial defects, we incorporate a training strategy involving random positional and quantitative masking. Moreover, we propose an innovative adaptive mask generator capable of generating adaptive masks that effectively mask anomalous regions while preserving normal regions. In this manner, the model can leverage the visible normal global contextual information to restore the masked anomalous regions, thereby effectively suppressing the reconstruction of defects. Extensive experimental results on the MVTec AD and BTAD industrial datasets validate the effectiveness of the proposed method. Additionally, AMI-Net exhibits exceptional real-time performance, striking a favorable balance between detection accuracy and speed, rendering it highly suitable for industrial applications. Code is available at: https://github.com/luow23/AMI-Net

Paper Structure

This paper contains 43 sections, 18 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Comparison of different unsupervised anomaly detection methods. (a) Vanilla autoencoder AE. (b) Existing mask-based method RIAD. (c) The proposed method (AMI-Net). It is noteworthy that our model is learned for feature reconstruction and a separate decoder is employed to render images from features. This decoder is only used for visualization.
  • Figure 2: The schematic diagrams of different methods. (a) Normal-data-based method utilizes only normal samples for training. However, during the testing phase, it still reconstructs defects, as neural networks inherently possess the property of generalization. (b) Artificial-defect-based method employs artificial defect samples for training. However, due to the lack of authenticity in artificial defects, real defects continue to be reconstructed during the testing phase. (c) Existing mask-based method employs a random masking strategy during the training process. However, during the testing phase, random masks fail to completely mask the defective areas, resulting in the reconstruction of defects. (d) Our method employs a random positional and quantitative masking strategy during the training process. During the testing phase, it generates the adaptive masks for defect images, effectively concealing all defect regions and achieving defect restoration.
  • Figure 3: Overall architecture of proposed AMI-Net. Firstly, multi-scale features are extracted using a pretrained CNN. (a) During the training phase, AMI-Net employs a strategy involving randomized positions and quantities of masks for the inpainting task. (b) During the testing phase, AMI-Net employs an adaptive mask generator to create a mask that dynamically conceals the defective region while preserving the normal area. Subsequently, the inpainting network is applied to acquire a reconstructed feature that is devoid of anomalies. Ultimately, by analyzing the input alongside the reconstructed feature, defects can be accurately localized.
  • Figure 4: Examples of the effectiveness of the adaptive mask generator. First Row: the defective image. Second Row: the distance map formed by the distance between latent feature and their corresponding cluster centers. Third Row: the adaptive mask. Final Row: the corresponding label.
  • Figure 5: Issues arising from clustering methods that do not take positional information into account.
  • ...and 6 more figures