Table of Contents
Fetching ...

Anomaly Detection with Conditioned Denoising Diffusion Models

Arian Mousakhan, Thomas Brox, Jawad Tayyub

TL;DR

DDAD introduces a conditioned denoising diffusion framework for anomaly detection that guides reconstruction toward a nominal target image, enabling defect-free reconstructions and accurate anomaly localisation through pixel-wise and feature-wise comparisons. An unsupervised domain adaptation module tailors a pretrained feature extractor to the problem domain, boosting feature-based anomaly scores. Across MVTec, VisA, and MTD datasets, DDAD achieves state-of-the-art image AUROC (up to 99.8%) and robust localisation while offering a lightweight variant (DDAD-S) for edge devices. The work demonstrates that conditioning diffusion models and adapting feature representations significantly improve reconstruction quality and anomaly detection performance beyond existing reconstruction- and representation-based methods.

Abstract

Traditional reconstruction-based methods have struggled to achieve competitive performance in anomaly detection. In this paper, we introduce Denoising Diffusion Anomaly Detection (DDAD), a novel denoising process for image reconstruction conditioned on a target image. This ensures a coherent restoration that closely resembles the target image. Our anomaly detection framework employs the conditioning mechanism, where the target image is set as the input image to guide the denoising process, leading to a defectless reconstruction while maintaining nominal patterns. Anomalies are then localised via a pixel-wise and feature-wise comparison of the input and reconstructed image. Finally, to enhance the effectiveness of the feature-wise comparison, we introduce a domain adaptation method that utilises nearly identical generated examples from our conditioned denoising process to fine-tune the pretrained feature extractor. The veracity of DDAD is demonstrated on various datasets including MVTec and VisA benchmarks, achieving state-of-the-art results of \(99.8 \%\) and \(98.9 \%\) image-level AUROC respectively.

Anomaly Detection with Conditioned Denoising Diffusion Models

TL;DR

DDAD introduces a conditioned denoising diffusion framework for anomaly detection that guides reconstruction toward a nominal target image, enabling defect-free reconstructions and accurate anomaly localisation through pixel-wise and feature-wise comparisons. An unsupervised domain adaptation module tailors a pretrained feature extractor to the problem domain, boosting feature-based anomaly scores. Across MVTec, VisA, and MTD datasets, DDAD achieves state-of-the-art image AUROC (up to 99.8%) and robust localisation while offering a lightweight variant (DDAD-S) for edge devices. The work demonstrates that conditioning diffusion models and adapting feature representations significantly improve reconstruction quality and anomaly detection performance beyond existing reconstruction- and representation-based methods.

Abstract

Traditional reconstruction-based methods have struggled to achieve competitive performance in anomaly detection. In this paper, we introduce Denoising Diffusion Anomaly Detection (DDAD), a novel denoising process for image reconstruction conditioned on a target image. This ensures a coherent restoration that closely resembles the target image. Our anomaly detection framework employs the conditioning mechanism, where the target image is set as the input image to guide the denoising process, leading to a defectless reconstruction while maintaining nominal patterns. Anomalies are then localised via a pixel-wise and feature-wise comparison of the input and reconstructed image. Finally, to enhance the effectiveness of the feature-wise comparison, we introduce a domain adaptation method that utilises nearly identical generated examples from our conditioned denoising process to fine-tune the pretrained feature extractor. The veracity of DDAD is demonstrated on various datasets including MVTec and VisA benchmarks, achieving state-of-the-art results of and image-level AUROC respectively.
Paper Structure (32 sections, 10 equations, 12 figures, 19 tables, 1 algorithm)

This paper contains 32 sections, 10 equations, 12 figures, 19 tables, 1 algorithm.

Figures (12)

  • Figure 1: Our approach achieves defect-free reconstruction of input images that are devoid of anomalies. An accurate anomaly detection heatmap is computed. Note that reconstructions are analogous to the expected nominal approximation of the input. In the category of cables, an incorrectly placed green cable has been corrected to a blue one by the model. Such corrected images may offer further benefit for the industry in repairing defects or worker training.
  • Figure 2: Framework of DDAD. After a denoising U-Net has been trained, the feature extractor is adapted to the problem domain by minimising the distance between the extracted features of a target image and a generated image which resembles the target image. At inference time, after perturbing the input image, the denoising process is conditioned on the same input image to make an anomaly-free reconstruction. Finally, the reconstructed image is compared with the input through both pixel and feature matching to generate an accurate anomaly localisation.
  • Figure 3: Top: Influence of conditioning parameter on reconstruction outcomes. Bottom: The first row illustrates a scenario where pixel-wise comparison proves ineffective, while the second row showcases a failure in feature-wise comparison. It is demonstrated that a combination leads to accurate detection in both cases.
  • Figure 4: Effectiveness of various components of our model on anomaly detection and segmentation. Left: Effectiveness of conditioning based on only pixel-wise image comparison. Middle: Performance increase due to domain adaptation of feature extractor. The conditioning is applied for reconstruction. Right: Impact of merging feature-wise and pixel-wise image comparison. All results are shown on MVTec bergmann2019mvtec dataset.
  • Figure 5: First and second rows depict samples on 'metal nut', 'capsule', 'transistor', and 'grid' selected from MVTec bergmann2019mvtec. Third and fourth rows depict samples of 'pcb4', 'chewing gum', 'pcb3' and 'capsules' selected from VisA zou2022spot.
  • ...and 7 more figures