AMI-Net: Adaptive Mask Inpainting Network for Industrial Anomaly Detection and Localization
Wei Luo, Haiming Yao, Wenyong Yu, Zhengyong Li
TL;DR
AMI-Net tackles unsupervised industrial anomaly detection by reframing reconstruction as adaptive mask inpainting guided by multi-scale semantic features. It introduces a random positional and quantitative masking strategy during training and an adaptive mask generator for inference, combined with a ViT-based inpainting network and a clustering-driven token framework to suppress defect reconstruction and enable precise localization. Across MVTec AD and BTAD, AMI-Net achieves strong localization and competitive image-level detection while delivering real-time performance, with ablations guiding design choices such as backbone, patch size, and clustering parameters. The approach demonstrates robust cross-dataset and few-shot capabilities, and the authors discuss extensions to scenarios with partial abnormal data and future improvements for subtle defects.
Abstract
Unsupervised visual anomaly detection is crucial for enhancing industrial production quality and efficiency. Among unsupervised methods, reconstruction approaches are popular due to their simplicity and effectiveness. The key aspect of reconstruction methods lies in the restoration of anomalous regions, which current methods have not satisfactorily achieved. To tackle this issue, we introduce a novel \uline{A}daptive \uline{M}ask \uline{I}npainting \uline{Net}work (AMI-Net) from the perspective of adaptive mask-inpainting. In contrast to traditional reconstruction methods that treat non-semantic image pixels as targets, our method uses a pre-trained network to extract multi-scale semantic features as reconstruction targets. Given the multiscale nature of industrial defects, we incorporate a training strategy involving random positional and quantitative masking. Moreover, we propose an innovative adaptive mask generator capable of generating adaptive masks that effectively mask anomalous regions while preserving normal regions. In this manner, the model can leverage the visible normal global contextual information to restore the masked anomalous regions, thereby effectively suppressing the reconstruction of defects. Extensive experimental results on the MVTec AD and BTAD industrial datasets validate the effectiveness of the proposed method. Additionally, AMI-Net exhibits exceptional real-time performance, striking a favorable balance between detection accuracy and speed, rendering it highly suitable for industrial applications. Code is available at: https://github.com/luow23/AMI-Net
