Table of Contents
Fetching ...

Not All Regions Are Equal: Attention-Guided Perturbation Network for Industrial Anomaly Detection

Tingfeng Huang, Weijia Kong, Yuxuan Cheng, Jingbo Xia, Rui Yu, Jinhai Xiang, Xinwei He

TL;DR

AGPNet tackles reconstruction-based unsupervised industrial anomaly detection by introducing an auxiliary attention-guided perturbation branch that generates sample-specific masks to aggressively perturb crucial regions during training. Leveraging a frozen DINO-based backbone and a lightweight ViT decoder, paired with momentum-distilled attention cues, the method enforces learning of invariant normal patterns and suppresses anomaly reconstruction. It achieves leading results across MVTec-AD, VisA, and MVTec-3D in multi-class, one-class, and few-shot scenarios, while maintaining reasonable efficiency. The approach offers strong practical impact for robust, scalable industrial defect detection with reduced reliance on extensive anomaly data.

Abstract

In unsupervised image anomaly detection, reconstruction methods aim to train models to capture normal patterns comprehensively for normal data reconstruction. Yet, these models sometimes retain unintended reconstruction capacity for anomalous regions during inference, leading to missed detections. To mitigate this issue, existing works perturb normal samples in a sample-agnostic manner, uniformly adding noise across spatial locations before reconstructing the original. Despite promising results, they disregard the fact that foreground locations are inherently more critical for robust reconstruction. Motivated by this, we present a novel reconstruction framework named Attention-Guided Perturbation Network (AGPNet) for industrial anomaly detection. Its core idea is to add perturbations guided by a sample-aware attention mask to improve the learning of invariant normal patterns at important locations. AGPNet consists of two branches, \ie, a reconstruction branch and an auxiliary attention-based perturbation one. The reconstruction branch learns to reconstruct normal samples, while the auxiliary one aims to produce attention masks to guide the noise perturbation process for normal samples. By perturbing more aggressively at those important regions, we encourage the reconstruction branch to learn inherent normal patterns both comprehensively and robustly. Extensive experiments are conducted on several popular benchmarks covering MVTec-AD, VisA, and MVTec-3D, and show that AGPNet consistently obtains leading anomaly detection performance across a variety of setups, including few-shot, one-class, and multi-class ones.

Not All Regions Are Equal: Attention-Guided Perturbation Network for Industrial Anomaly Detection

TL;DR

AGPNet tackles reconstruction-based unsupervised industrial anomaly detection by introducing an auxiliary attention-guided perturbation branch that generates sample-specific masks to aggressively perturb crucial regions during training. Leveraging a frozen DINO-based backbone and a lightweight ViT decoder, paired with momentum-distilled attention cues, the method enforces learning of invariant normal patterns and suppresses anomaly reconstruction. It achieves leading results across MVTec-AD, VisA, and MVTec-3D in multi-class, one-class, and few-shot scenarios, while maintaining reasonable efficiency. The approach offers strong practical impact for robust, scalable industrial defect detection with reduced reliance on extensive anomaly data.

Abstract

In unsupervised image anomaly detection, reconstruction methods aim to train models to capture normal patterns comprehensively for normal data reconstruction. Yet, these models sometimes retain unintended reconstruction capacity for anomalous regions during inference, leading to missed detections. To mitigate this issue, existing works perturb normal samples in a sample-agnostic manner, uniformly adding noise across spatial locations before reconstructing the original. Despite promising results, they disregard the fact that foreground locations are inherently more critical for robust reconstruction. Motivated by this, we present a novel reconstruction framework named Attention-Guided Perturbation Network (AGPNet) for industrial anomaly detection. Its core idea is to add perturbations guided by a sample-aware attention mask to improve the learning of invariant normal patterns at important locations. AGPNet consists of two branches, \ie, a reconstruction branch and an auxiliary attention-based perturbation one. The reconstruction branch learns to reconstruct normal samples, while the auxiliary one aims to produce attention masks to guide the noise perturbation process for normal samples. By perturbing more aggressively at those important regions, we encourage the reconstruction branch to learn inherent normal patterns both comprehensively and robustly. Extensive experiments are conducted on several popular benchmarks covering MVTec-AD, VisA, and MVTec-3D, and show that AGPNet consistently obtains leading anomaly detection performance across a variety of setups, including few-shot, one-class, and multi-class ones.
Paper Structure (21 sections, 8 equations, 5 figures, 9 tables)

This paper contains 21 sections, 8 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Comparisons between ours and existing perturbation strategies in reconstruction paradigm for anomaly detection. Compared with (a) existing networks adopting fixed or random perturbations, (b) we introduce to learn to guide the perturbation with sample-aware masks from the network and help to learn a better reconstruction model.
  • Figure 2: A framework overview of AGPNet. It consists of two branches, ie, the main reconstruction branch and the attention-guided perturbation branch. During training, given an input normal image $I_{normal}$, the main reconstruction branch is used for reconstruction, while the perturbation branch aims to generate attention masks based on the main branch for perturbation at both image and feature levels, making the reconstruction network focus on the important local details. During inference, we simply keep the main branch and generate the anomaly map by comparing the input and output of the decoder.
  • Figure 3: Qualitative illustration on MVTec-AD dataset.
  • Figure 4: Qualitative illustration on MVTec-AD dataset.
  • Figure 5: Qualitative illustration on Visa dataset.