Table of Contents
Fetching ...

Unsupervised Anomaly Detection via Masked Diffusion Posterior Sampling

Di Wu, Shicai Fan, Xue Zhou, Li Yu, Yuzhong Deng, Jianxiao Zou, Baihong Lin

TL;DR

Unsupervised anomaly detection remains challenging when reconstruction quality is insufficient or reconstruction lacks interpretability. The authors introduce Masked Diffusion Posterior Sampling (MDPS), which treats normal image reconstruction as Bayesian posterior sampling given a masked noisy observation, using a diffusion prior (DDIM) trained on normal data. By combining a masked observation model with posterior sampling and a joint pixel-level/perceptual-level scoring scheme across multiple samples, MDPS achieves state-of-the-art reconstruction quality and anomaly localization on MVTec and BTAD. While computationally intensive due to multiple posterior samplings, MDPS establishes a rigorous, interpretable framework for diffusion-based UAD with strong empirical gains and practical impact for industrial anomaly detection and localization.

Abstract

Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image reconstruction and unexpectedly suffer from low reconstruction quality. To address these issues, this paper proposes a novel and highly-interpretable method named Masked Diffusion Posterior Sampling (MDPS). In MDPS, the problem of normal image reconstruction is mathematically modeled as multiple diffusion posterior sampling for normal images based on the devised masked noisy observation model and the diffusion-based normal image prior under Bayesian framework. Using a metric designed from pixel-level and perceptual-level perspectives, MDPS can effectively compute the difference map between each normal posterior sample and the given test image. Anomaly scores are obtained by averaging all difference maps for multiple posterior samples. Exhaustive experiments on MVTec and BTAD datasets demonstrate that MDPS can achieve state-of-the-art performance in normal image reconstruction quality as well as anomaly detection and localization.

Unsupervised Anomaly Detection via Masked Diffusion Posterior Sampling

TL;DR

Unsupervised anomaly detection remains challenging when reconstruction quality is insufficient or reconstruction lacks interpretability. The authors introduce Masked Diffusion Posterior Sampling (MDPS), which treats normal image reconstruction as Bayesian posterior sampling given a masked noisy observation, using a diffusion prior (DDIM) trained on normal data. By combining a masked observation model with posterior sampling and a joint pixel-level/perceptual-level scoring scheme across multiple samples, MDPS achieves state-of-the-art reconstruction quality and anomaly localization on MVTec and BTAD. While computationally intensive due to multiple posterior samplings, MDPS establishes a rigorous, interpretable framework for diffusion-based UAD with strong empirical gains and practical impact for industrial anomaly detection and localization.

Abstract

Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image reconstruction and unexpectedly suffer from low reconstruction quality. To address these issues, this paper proposes a novel and highly-interpretable method named Masked Diffusion Posterior Sampling (MDPS). In MDPS, the problem of normal image reconstruction is mathematically modeled as multiple diffusion posterior sampling for normal images based on the devised masked noisy observation model and the diffusion-based normal image prior under Bayesian framework. Using a metric designed from pixel-level and perceptual-level perspectives, MDPS can effectively compute the difference map between each normal posterior sample and the given test image. Anomaly scores are obtained by averaging all difference maps for multiple posterior samples. Exhaustive experiments on MVTec and BTAD datasets demonstrate that MDPS can achieve state-of-the-art performance in normal image reconstruction quality as well as anomaly detection and localization.
Paper Structure (24 sections, 20 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 20 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of MDPS. The MDPS denoiser shown in the pink box is designed partially based on the denoiser $\boldsymbol{\epsilon}(\boldsymbol{x}_t,t)$ of DDIM for sampling of $p(x_0)$. Based on MDPS, we can obtain $n$ normal posterior samples from $n$ noisy versions of the test image $\boldsymbol y$ respectively. Then, the final anomaly map is obtained by averaging $n$ difference maps computed from $n$ normal posterior samples and the test image $\boldsymbol y$.
  • Figure 2: Comparisons of the reconstruction results MVTec. Noting the area in dotted boxes, MDPS gives the best reconstruction results.
  • Figure 3: Qualitative comparison of different guidance scale $\rho$. The left side of the dotted line represents the original images and ground truths. The first and third lines on the right side of the dotted line represent the reconstructed image, and the second and fourth lines represent the heatmap. Note areas in the dotted boxes.
  • Figure 4: Ablation results of MDPS on MVtec.
  • Figure 5: Qualitative results of our method. We choose 12 examples from MVTec and BTAD and more results can be found in the supplementary. From left to right are original images, normal reconstruction images, ground truth and our localization results.
  • ...and 1 more figures