Anomaly Detection with Conditioned Denoising Diffusion Models
Arian Mousakhan, Thomas Brox, Jawad Tayyub
TL;DR
DDAD introduces a conditioned denoising diffusion framework for anomaly detection that guides reconstruction toward a nominal target image, enabling defect-free reconstructions and accurate anomaly localisation through pixel-wise and feature-wise comparisons. An unsupervised domain adaptation module tailors a pretrained feature extractor to the problem domain, boosting feature-based anomaly scores. Across MVTec, VisA, and MTD datasets, DDAD achieves state-of-the-art image AUROC (up to 99.8%) and robust localisation while offering a lightweight variant (DDAD-S) for edge devices. The work demonstrates that conditioning diffusion models and adapting feature representations significantly improve reconstruction quality and anomaly detection performance beyond existing reconstruction- and representation-based methods.
Abstract
Traditional reconstruction-based methods have struggled to achieve competitive performance in anomaly detection. In this paper, we introduce Denoising Diffusion Anomaly Detection (DDAD), a novel denoising process for image reconstruction conditioned on a target image. This ensures a coherent restoration that closely resembles the target image. Our anomaly detection framework employs the conditioning mechanism, where the target image is set as the input image to guide the denoising process, leading to a defectless reconstruction while maintaining nominal patterns. Anomalies are then localised via a pixel-wise and feature-wise comparison of the input and reconstructed image. Finally, to enhance the effectiveness of the feature-wise comparison, we introduce a domain adaptation method that utilises nearly identical generated examples from our conditioned denoising process to fine-tune the pretrained feature extractor. The veracity of DDAD is demonstrated on various datasets including MVTec and VisA benchmarks, achieving state-of-the-art results of \(99.8 \%\) and \(98.9 \%\) image-level AUROC respectively.
