Table of Contents
Fetching ...

Breaking the Bias: Recalibrating the Attention of Industrial Anomaly Detection

Xin Chen, Liujuan Cao, Shengchuan Zhang, Xiewu Zheng, Yan Zhang

TL;DR

Industrial anomaly detection under unsupervised learning faces attention bias toward variable regions in normal samples. The paper proposes RAAD, a two-stage framework that first reduces bias via hierarchical quantization and then enhances defect sensitivity through fine-tuning, guided by Hierarchical Quantization Scoring (HQS) which allocates bit-width across network layers. Evaluations across 32 datasets (including MVTec AD, MVTec LOCO-AD, and VisA) show RAAD improves both image-level detection (AUROC) and pixel-level localization (AU-PRO/PRO) and outperforms state-of-the-art baselines, while enabling efficient on-device inference through mixed-precision quantization. The approach combines a lightweight teacher-student PDN with an autoencoder to balance local and global cues, delivering robust performance with reduced computational demands for industrial inspection tasks.

Abstract

Due to the scarcity and unpredictable nature of defect samples, industrial anomaly detection (IAD) predominantly employs unsupervised learning. However, all unsupervised IAD methods face a common challenge: the inherent bias in normal samples, which causes models to focus on variable regions while overlooking potential defects in invariant areas. To effectively overcome this, it is essential to decompose and recalibrate attention, guiding the model to suppress irrelevant variations and concentrate on subtle, defect-susceptible areas. In this paper, we propose Recalibrating Attention of Industrial Anomaly Detection (RAAD), a framework that systematically decomposes and recalibrates attention maps. RAAD employs a two-stage process: first, it reduces attention bias through quantization, and second, it fine-tunes defect-prone regions for improved sensitivity. Central to this framework is Hierarchical Quantization Scoring (HQS), which dynamically allocates bit-widths across layers based on their anomaly detection contributions. HQS dynamically adjusts bit-widths based on the hierarchical nature of attention maps, compressing lower layers that produce coarse and noisy attention while preserving deeper layers with sharper, defect-focused attention. This approach optimizes both computational efficiency and the model' s sensitivity to anomalies. We validate the effectiveness of RAAD on 32 datasets using a single 3090ti. Experiments demonstrate that RAAD, balances the complexity and expressive power of the model, enhancing its anomaly detection capability.

Breaking the Bias: Recalibrating the Attention of Industrial Anomaly Detection

TL;DR

Industrial anomaly detection under unsupervised learning faces attention bias toward variable regions in normal samples. The paper proposes RAAD, a two-stage framework that first reduces bias via hierarchical quantization and then enhances defect sensitivity through fine-tuning, guided by Hierarchical Quantization Scoring (HQS) which allocates bit-width across network layers. Evaluations across 32 datasets (including MVTec AD, MVTec LOCO-AD, and VisA) show RAAD improves both image-level detection (AUROC) and pixel-level localization (AU-PRO/PRO) and outperforms state-of-the-art baselines, while enabling efficient on-device inference through mixed-precision quantization. The approach combines a lightweight teacher-student PDN with an autoencoder to balance local and global cues, delivering robust performance with reduced computational demands for industrial inspection tasks.

Abstract

Due to the scarcity and unpredictable nature of defect samples, industrial anomaly detection (IAD) predominantly employs unsupervised learning. However, all unsupervised IAD methods face a common challenge: the inherent bias in normal samples, which causes models to focus on variable regions while overlooking potential defects in invariant areas. To effectively overcome this, it is essential to decompose and recalibrate attention, guiding the model to suppress irrelevant variations and concentrate on subtle, defect-susceptible areas. In this paper, we propose Recalibrating Attention of Industrial Anomaly Detection (RAAD), a framework that systematically decomposes and recalibrates attention maps. RAAD employs a two-stage process: first, it reduces attention bias through quantization, and second, it fine-tunes defect-prone regions for improved sensitivity. Central to this framework is Hierarchical Quantization Scoring (HQS), which dynamically allocates bit-widths across layers based on their anomaly detection contributions. HQS dynamically adjusts bit-widths based on the hierarchical nature of attention maps, compressing lower layers that produce coarse and noisy attention while preserving deeper layers with sharper, defect-focused attention. This approach optimizes both computational efficiency and the model' s sensitivity to anomalies. We validate the effectiveness of RAAD on 32 datasets using a single 3090ti. Experiments demonstrate that RAAD, balances the complexity and expressive power of the model, enhancing its anomaly detection capability.

Paper Structure

This paper contains 15 sections, 6 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Visualization of heatmaps. These samples are from the MVTec-AD and MVTec LOCO datasets, which represents examples of industrial products, the average heatmap of normal samples, and the average heatmap for anomaly samples, respectively. It clearly shows the bias contained in the normal samples compared to the abnormal samples
  • Figure 2: (a) visualization of the attention maps at different stages of the model, from left to right, are the anomaly image, ground-truth, and predicted anomaly score. (b) the layer-wise attention outputs, demonstrating the varying importance of each layer in anomaly detection.
  • Figure 3: Pipeline of RAAD. Our architecture consists of three components: the teacher-student model and the autoencoder. During training and fine-tuning, we only use normal images. The process is divided into three steps: 1. Initial training of the model, 2. Decomposition of attention map in hierarchical quantitative scoring, detailed in Figure \ref{['layer-wise']}. 3. Fine-tuning of attention recalibration.
  • Figure 4: Hierarchical Quantization Scoring (HQS) Module. The teacher and student models are aligned layer by layer, with the anomaly scores calculated using the outputs of their respective convolutional layers. These scores are then converted into quantization bit-widths through a piecewise function. Below are the details of the teacher-student network (PDN).
  • Figure 5: The inference process of the models, the input is from the MVTec AD test dataset. "Diff" refers to computing the element-wise squared difference between two collections of output feature maps and computing its average across feature maps. To obtain pixel anomaly scores, the anomaly maps are resized to match the input image using bilinear interpolation.
  • ...and 2 more figures