Table of Contents
Fetching ...

Dynamic Addition of Noise in a Diffusion Model for Anomaly Detection

Justin Tebbe, Jawad Tayyub

TL;DR

A novel framework that enhances the capability of diffusion models, by extending the previous introduced implicit conditioning approach in three significant ways, and incorporates a dynamic step size computation that allows for variable noising steps in the forward process guided by an initial anomaly prediction.

Abstract

Diffusion models have found valuable applications in anomaly detection by capturing the nominal data distribution and identifying anomalies via reconstruction. Despite their merits, they struggle to localize anomalies of varying scales, especially larger anomalies such as entire missing components. Addressing this, we present a novel framework that enhances the capability of diffusion models, by extending the previous introduced implicit conditioning approach Meng et al. (2022) in three significant ways. First, we incorporate a dynamic step size computation that allows for variable noising steps in the forward process guided by an initial anomaly prediction. Second, we demonstrate that denoising an only scaled input, without any added noise, outperforms conventional denoising process. Third, we project images in a latent space to abstract away from fine details that interfere with reconstruction of large missing components. Additionally, we propose a fine-tuning mechanism that facilitates the model to effectively grasp the nuances of the target domain. Our method undergoes rigorous evaluation on prominent anomaly detection datasets VisA, BTAD and MVTec yielding strong performance. Importantly, our framework effectively localizes anomalies regardless of their scale, marking a pivotal advancement in diffusion-based anomaly detection.

Dynamic Addition of Noise in a Diffusion Model for Anomaly Detection

TL;DR

A novel framework that enhances the capability of diffusion models, by extending the previous introduced implicit conditioning approach in three significant ways, and incorporates a dynamic step size computation that allows for variable noising steps in the forward process guided by an initial anomaly prediction.

Abstract

Diffusion models have found valuable applications in anomaly detection by capturing the nominal data distribution and identifying anomalies via reconstruction. Despite their merits, they struggle to localize anomalies of varying scales, especially larger anomalies such as entire missing components. Addressing this, we present a novel framework that enhances the capability of diffusion models, by extending the previous introduced implicit conditioning approach Meng et al. (2022) in three significant ways. First, we incorporate a dynamic step size computation that allows for variable noising steps in the forward process guided by an initial anomaly prediction. Second, we demonstrate that denoising an only scaled input, without any added noise, outperforms conventional denoising process. Third, we project images in a latent space to abstract away from fine details that interfere with reconstruction of large missing components. Additionally, we propose a fine-tuning mechanism that facilitates the model to effectively grasp the nuances of the target domain. Our method undergoes rigorous evaluation on prominent anomaly detection datasets VisA, BTAD and MVTec yielding strong performance. Importantly, our framework effectively localizes anomalies regardless of their scale, marking a pivotal advancement in diffusion-based anomaly detection.
Paper Structure (24 sections, 11 equations, 15 figures, 9 tables, 2 algorithms)

This paper contains 24 sections, 11 equations, 15 figures, 9 tables, 2 algorithms.

Figures (15)

  • Figure 1: Dynamic conditioning whereby the amount of added noise is a function of the input image and training dataset dependent on an initial guess of the severity of the anomaly.
  • Figure 2: Segmentation results of our dynamic approach of anomalies across scales from VisA and BTAD.
  • Figure 3: Reconstruction Architecture: An input ${\bm{x}}_0$ is fed to the DIC to determine the level it must be perturbed $\hat{T}$. ${\bm{x}}_0$ is also projected to a latent representation ${\bm{z}}_0$. Denoising is performed in the latent space leading to a predicted latent $\hat{{\bm{z}}}_0$ which is decoded into a reconstruction $\hat{{\bm{x}}}_0$. DIC: The average distance of extracted features of a test image to the K nearest neighbours from the training set is quantized, using equally sized predefined bins, to then determine the dynamic noising step $\hat{T}$.
  • Figure 4: Overview of the Anomaly Map construction. Feature heatmap ($f_{map}$) are computed as cosine distances of the features of the input ${\bm{x}}_0$ and its reconstruction $\hat{{\bm{x}}}_0$ whereas latent heatmap ($l_{map}$) is calculated using an $\mathcal{L}1$ distance between the corresponding latent representations of ${\bm{x}}_0$ and $\hat{{\bm{x}}}_0$. These combine linearly to form the final anomaly heatmap ($A_{map}$).
  • Figure 5: Histogram of the binning values for the training set in blue and test set in orange, showing a distribution shift to larger values for the test set. Displayed are categories from VisA and BTAD.
  • ...and 10 more figures