Table of Contents
Fetching ...

GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

Hang Yao, Ming Liu, Haolin Wang, Zhicun Yin, Zifei Yan, Xiaopeng Hong, Wangmeng Zuo

TL;DR

GLAD introduces a global and local adaptive diffusion framework for unsupervised anomaly detection. By employing Adaptive Denoising Steps, Anomaly-oriented Training Paradigm, and Spatial-Adaptive Feature Fusion, it achieves anomaly-free reconstruction while preserving normal details. The approach yields state-of-the-art performance across four industrial datasets and provides robust anomaly scoring via multi-layer feature maps. This combination enhances both detection and localization, with practical implications for real-world quality control and fault detection. The work also discusses limitations such as inference-time overhead and outlines future work to improve efficiency and scalability.

Abstract

Diffusion models have shown superior performance on unsupervised anomaly detection tasks. Since trained with normal data only, diffusion models tend to reconstruct normal counterparts of test images with certain noises added. However, these methods treat all potential anomalies equally, which may cause two main problems. From the global perspective, the difficulty of reconstructing images with different anomalies is uneven. Therefore, instead of utilizing the same setting for all samples, we propose to predict a particular denoising step for each sample by evaluating the difference between image contents and the priors extracted from diffusion models. From the local perspective, reconstructing abnormal regions differs from normal areas even in the same image. Theoretically, the diffusion model predicts a noise for each step, typically following a standard Gaussian distribution. However, due to the difference between the anomaly and its potential normal counterpart, the predicted noise in abnormal regions will inevitably deviate from the standard Gaussian distribution. To this end, we propose introducing synthetic abnormal samples in training to encourage the diffusion models to break through the limitation of standard Gaussian distribution, and a spatial-adaptive feature fusion scheme is utilized during inference. With the above modifications, we propose a global and local adaptive diffusion model (abbreviated to GLAD) for unsupervised anomaly detection, which introduces appealing flexibility and achieves anomaly-free reconstruction while retaining as much normal information as possible. Extensive experiments are conducted on three commonly used anomaly detection datasets (MVTec-AD, MPDD, and VisA) and a printed circuit board dataset (PCB-Bank) we integrated, showing the effectiveness of the proposed method.

GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

TL;DR

GLAD introduces a global and local adaptive diffusion framework for unsupervised anomaly detection. By employing Adaptive Denoising Steps, Anomaly-oriented Training Paradigm, and Spatial-Adaptive Feature Fusion, it achieves anomaly-free reconstruction while preserving normal details. The approach yields state-of-the-art performance across four industrial datasets and provides robust anomaly scoring via multi-layer feature maps. This combination enhances both detection and localization, with practical implications for real-world quality control and fault detection. The work also discusses limitations such as inference-time overhead and outlines future work to improve efficiency and scalability.

Abstract

Diffusion models have shown superior performance on unsupervised anomaly detection tasks. Since trained with normal data only, diffusion models tend to reconstruct normal counterparts of test images with certain noises added. However, these methods treat all potential anomalies equally, which may cause two main problems. From the global perspective, the difficulty of reconstructing images with different anomalies is uneven. Therefore, instead of utilizing the same setting for all samples, we propose to predict a particular denoising step for each sample by evaluating the difference between image contents and the priors extracted from diffusion models. From the local perspective, reconstructing abnormal regions differs from normal areas even in the same image. Theoretically, the diffusion model predicts a noise for each step, typically following a standard Gaussian distribution. However, due to the difference between the anomaly and its potential normal counterpart, the predicted noise in abnormal regions will inevitably deviate from the standard Gaussian distribution. To this end, we propose introducing synthetic abnormal samples in training to encourage the diffusion models to break through the limitation of standard Gaussian distribution, and a spatial-adaptive feature fusion scheme is utilized during inference. With the above modifications, we propose a global and local adaptive diffusion model (abbreviated to GLAD) for unsupervised anomaly detection, which introduces appealing flexibility and achieves anomaly-free reconstruction while retaining as much normal information as possible. Extensive experiments are conducted on three commonly used anomaly detection datasets (MVTec-AD, MPDD, and VisA) and a printed circuit board dataset (PCB-Bank) we integrated, showing the effectiveness of the proposed method.
Paper Structure (28 sections, 13 equations, 9 figures, 14 tables)

This paper contains 28 sections, 13 equations, 9 figures, 14 tables.

Figures (9)

  • Figure 1: Illustration of adaptive denoising process. For severe anomalies like missing elements, it requires a large number of denoising steps (900) to add the element back, while for small anomalies like scratch, 300 steps are already enough. Besides, setting a large enough denoising step (e.g., 900) for all samples will affect the detail preservation. For example, in the area bounded by red lines, the position of the element is changed, which will be marked as anomalies during the comparison process.
  • Figure 2: The reconstruction pipeline of the proposed GLAD, including the Adaptive Denoising Steps (\ref{['sec:method_adaptive_denoising_steps']}) and the Spatial-Adaptive Feature Fusion Scheme (\ref{['sec:method_spatial_adaptive_feature_fusion']}).
  • Figure 3: Reconstructions and qualitative comparisons with other methods. The first four rows display examples of the MVTec-AD dataset, and the last row is for the MPDD dataset. OCR-GAN only produces anomaly scores, and there is no anomaly map. SimpleNet is the embedding-based method.
  • Figure 4: Reconstructions of different types of anomaly and proper steps. Examples (a), (c), and (e) contain small-scale anomalies, and (b), (d), and (f) are large-scale anomalies. The numbers above the reconstructed images represent the proper steps. Differences in details of normal areas are marked in red circles.
  • Figure 5: Qualitative comparisons between baseline and proposed ATP on MVTec-AD. The same denoising steps are used for the two methods.
  • ...and 4 more figures