Table of Contents
Fetching ...

InvAD: Inversion-based Reconstruction-Free Anomaly Detection with Diffusion Models

Shunsuke Sakai, Xiangteng He, Chunzhi Gu, Leonid Sigal, Tatsuhito Hasegawa

TL;DR

InvAD addresses the efficiency accuracy tradeoff in diffusion based anomaly detection by replacing pixel space reconstruction with latent inversion. It performs inference on a latent diffusion trajectory in a small number of steps and uses a combined log likelihood and norm based score to robustly detect anomalies in a reconstruction free setting. The key contributions are identifying the limitations of RGB denoising approaches, proposing a latent inversion based framework, and demonstrating state of the art performance with substantial inference time gains across industrial and medical benchmarks. The approach is plug and play for existing diffusion models and shows practical gains in real world anomaly detection tasks.

Abstract

Despite the remarkable success, recent reconstruction-based anomaly detection (AD) methods via diffusion modeling still involve fine-grained noise-strength tuning and computationally expensive multi-step denoising, leading to a fundamental tension between fidelity and efficiency. In this paper, we propose InvAD, a novel inversion-based anomaly detection approach ("detection via noising in latent space") that circumvents explicit reconstruction. Importantly, we contend that the limitations in prior reconstruction-based methods originate from the prevailing "detection via denoising in RGB space" paradigm. To address this, we model AD under a reconstruction-free formulation, which directly infers the final latent variable corresponding to the input image via DDIM inversion, and then measures the deviation based on the known prior distribution for anomaly scoring. Specifically, in approximating the original probability flow ODE using the Euler method, we enforce only a few inversion steps to noise the clean image to pursue inference efficiency. As the added noise is adaptively derived with the learned diffusion model, the original features for the clean testing image can still be leveraged to yield high detection accuracy. We perform extensive experiments and detailed analyses across four widely used industrial and medical AD benchmarks under the unsupervised unified setting to demonstrate the effectiveness of our model, achieving state-of-the-art AD performance and approximately 2x inference-time speedup without diffusion distillation.

InvAD: Inversion-based Reconstruction-Free Anomaly Detection with Diffusion Models

TL;DR

InvAD addresses the efficiency accuracy tradeoff in diffusion based anomaly detection by replacing pixel space reconstruction with latent inversion. It performs inference on a latent diffusion trajectory in a small number of steps and uses a combined log likelihood and norm based score to robustly detect anomalies in a reconstruction free setting. The key contributions are identifying the limitations of RGB denoising approaches, proposing a latent inversion based framework, and demonstrating state of the art performance with substantial inference time gains across industrial and medical benchmarks. The approach is plug and play for existing diffusion models and shows practical gains in real world anomaly detection tasks.

Abstract

Despite the remarkable success, recent reconstruction-based anomaly detection (AD) methods via diffusion modeling still involve fine-grained noise-strength tuning and computationally expensive multi-step denoising, leading to a fundamental tension between fidelity and efficiency. In this paper, we propose InvAD, a novel inversion-based anomaly detection approach ("detection via noising in latent space") that circumvents explicit reconstruction. Importantly, we contend that the limitations in prior reconstruction-based methods originate from the prevailing "detection via denoising in RGB space" paradigm. To address this, we model AD under a reconstruction-free formulation, which directly infers the final latent variable corresponding to the input image via DDIM inversion, and then measures the deviation based on the known prior distribution for anomaly scoring. Specifically, in approximating the original probability flow ODE using the Euler method, we enforce only a few inversion steps to noise the clean image to pursue inference efficiency. As the added noise is adaptively derived with the learned diffusion model, the original features for the clean testing image can still be leveraged to yield high detection accuracy. We perform extensive experiments and detailed analyses across four widely used industrial and medical AD benchmarks under the unsupervised unified setting to demonstrate the effectiveness of our model, achieving state-of-the-art AD performance and approximately 2x inference-time speedup without diffusion distillation.

Paper Structure

This paper contains 33 sections, 19 equations, 4 figures, 12 tables, 1 algorithm.

Figures (4)

  • Figure 1: Accuracy v.s. Speed relationship of diffusion-based AD methods DiADTransFusionomiaddecodiffmad on MVTecAD. Our proposed InvAD achieves state-of-the-art AD performance with a substantially speedup.
  • Figure 2: Conceptual comparison of conventional and our proposed AD paradigm. Conventional reconstruction-based paradigm (a) first perturbs an input sample $\mathbf{x}_0$ to a latent state $\mathbf{x}_t$ at step $t$ , and then denoises $\mathbf{x}_t$ back to $\mathbf{x}_0$ . The anomaly score is computed as the mean squared error (MSE) between the original input and its reconstructed sample. In contrast, our inversion-based paradigm (b) directly infers the latent state at the final step, $\mathbf{x}_T$, by tracing the PF-ODE trajectories. The anomaly score is then determined based on the typicality of $\mathbf{x}_T$ within the tractable latent distribution.
  • Figure 3: Visualization of anomaly localization against MDM omiad on MVTecAD. GT displays the ground-truth anomaly map.
  • Figure 4: Comparison of the histogram of normal and anomalous samples, with the conventional NLL scoring (a) and our proposed NLL+Abs scoring (b), on the test set of hazelnut in MVTecAD.