Table of Contents
Fetching ...

On Diffusion Modeling for Anomaly Detection

Victor Livernoche, Vineet Jain, Yashar Hezaveh, Siamak Ravanbakhsh

TL;DR

This work examines diffusion models for anomaly detection and finds that while DDPMs perform well, they are computationally intensive. It introduces Diffusion Time Estimation (DTE), which leverages the posterior distribution over diffusion time $p(\sigma_t^2|x_s)$ to score anomalies, deriving an analytic inverse-Gamma form and offering non-parametric and parametric (IG and categorical) estimators for scalable inference. Across 57 datasets in the ADBench benchmark, DTE variants achieve competitive performance with significantly faster inference than DDPMs, and image embeddings further boost performance. Overall, diffusion-time based anomaly detection emerges as a scalable alternative to traditional methods and deep-learning approaches for diverse unsupervised and semi-supervised settings.

Abstract

Known for their impressive performance in generative modeling, diffusion models are attractive candidates for density-based anomaly detection. This paper investigates different variations of diffusion modeling for unsupervised and semi-supervised anomaly detection. In particular, we find that Denoising Diffusion Probability Models (DDPM) are performant on anomaly detection benchmarks yet computationally expensive. By simplifying DDPM in application to anomaly detection, we are naturally led to an alternative approach called Diffusion Time Estimation (DTE). DTE estimates the distribution over diffusion time for a given input and uses the mode or mean of this distribution as the anomaly score. We derive an analytical form for this density and leverage a deep neural network to improve inference efficiency. Through empirical evaluations on the ADBench benchmark, we demonstrate that all diffusion-based anomaly detection methods perform competitively for both semi-supervised and unsupervised settings. Notably, DTE achieves orders of magnitude faster inference time than DDPM, while outperforming it on this benchmark. These results establish diffusion-based anomaly detection as a scalable alternative to traditional methods and recent deep-learning techniques for standard unsupervised and semi-supervised anomaly detection settings.

On Diffusion Modeling for Anomaly Detection

TL;DR

This work examines diffusion models for anomaly detection and finds that while DDPMs perform well, they are computationally intensive. It introduces Diffusion Time Estimation (DTE), which leverages the posterior distribution over diffusion time to score anomalies, deriving an analytic inverse-Gamma form and offering non-parametric and parametric (IG and categorical) estimators for scalable inference. Across 57 datasets in the ADBench benchmark, DTE variants achieve competitive performance with significantly faster inference than DDPMs, and image embeddings further boost performance. Overall, diffusion-time based anomaly detection emerges as a scalable alternative to traditional methods and deep-learning approaches for diverse unsupervised and semi-supervised settings.

Abstract

Known for their impressive performance in generative modeling, diffusion models are attractive candidates for density-based anomaly detection. This paper investigates different variations of diffusion modeling for unsupervised and semi-supervised anomaly detection. In particular, we find that Denoising Diffusion Probability Models (DDPM) are performant on anomaly detection benchmarks yet computationally expensive. By simplifying DDPM in application to anomaly detection, we are naturally led to an alternative approach called Diffusion Time Estimation (DTE). DTE estimates the distribution over diffusion time for a given input and uses the mode or mean of this distribution as the anomaly score. We derive an analytical form for this density and leverage a deep neural network to improve inference efficiency. Through empirical evaluations on the ADBench benchmark, we demonstrate that all diffusion-based anomaly detection methods perform competitively for both semi-supervised and unsupervised settings. Notably, DTE achieves orders of magnitude faster inference time than DDPM, while outperforming it on this benchmark. These results establish diffusion-based anomaly detection as a scalable alternative to traditional methods and recent deep-learning techniques for standard unsupervised and semi-supervised anomaly detection settings.
Paper Structure (41 sections, 11 equations, 20 figures, 12 tables, 1 algorithm)

This paper contains 41 sections, 11 equations, 20 figures, 12 tables, 1 algorithm.

Figures (20)

  • Figure 1: Average inference time vs. average AUC ROC for all 57 ADBench datasets in the semi-supervised setting. Lower right is better (DTE Categorical). Colour scheme: red (diffusion-based), green (deep learning), blue (classical).
  • Figure 2: DDPM and DTE on a toy dataset shown in (a). (b) shows the Gaussian density function associated with the lowest timestep of DDPM and (c) shows the vector field corresponding to the gradient of this density. (d) plots the mode of the DTE posterior distribution over diffusion time, which we show in subsequent sections is an inverse Gamma distribution. (e) shows the gradient of (d), and (f) shows the flow associated with this gradient, showing that random samples are mapped toward the data manifold.
  • Figure 3: Posterior timestep distribution $p(\sigma^2_t | \mathbf{x}_s)$, where $\mathbf{x}_s$ is produced using diffusion with different time steps $s \in \{1,\ldots, T\}$, averaged over the vertebral dataset. (a) shows the analytical distribution computed by placing Gaussian distributions of different variances at each point in the dataset, and (b) shows the inverse Gamma distribution with scale parameter value depending on the average distance to the k-nearest neighbours ($k=32$).
  • Figure 4: Predicted diffusion time against ground truth diffusion time for Gaussian model ($\ell_2$-regression), Inverse Gamma model, and categorical model (with seven bins) on the test set for various datasets. The maximum length of the diffusion Markov chain is $T=300$. The shaded region indicates the standard deviation in predictions across the dataset.
  • Figure 5: AUC ROC means and standard deviations on the 57 datasets from ADBench over five different seeds for a) the semi-supervised setting using normal samples only for training and b) the unsupervised setting with bootstrapped training instances. Colour scheme: red (diffusion-based), green (deep learning methods), blue (classical methods). DTE outperforms all baselines for the semi-supervised setting apart from kNN. It is also competitive in the unsupervised setting.
  • ...and 15 more figures