InvAD: Inversion-based Reconstruction-Free Anomaly Detection with Diffusion Models
Shunsuke Sakai, Xiangteng He, Chunzhi Gu, Leonid Sigal, Tatsuhito Hasegawa
TL;DR
InvAD addresses the efficiency accuracy tradeoff in diffusion based anomaly detection by replacing pixel space reconstruction with latent inversion. It performs inference on a latent diffusion trajectory in a small number of steps and uses a combined log likelihood and norm based score to robustly detect anomalies in a reconstruction free setting. The key contributions are identifying the limitations of RGB denoising approaches, proposing a latent inversion based framework, and demonstrating state of the art performance with substantial inference time gains across industrial and medical benchmarks. The approach is plug and play for existing diffusion models and shows practical gains in real world anomaly detection tasks.
Abstract
Despite the remarkable success, recent reconstruction-based anomaly detection (AD) methods via diffusion modeling still involve fine-grained noise-strength tuning and computationally expensive multi-step denoising, leading to a fundamental tension between fidelity and efficiency. In this paper, we propose InvAD, a novel inversion-based anomaly detection approach ("detection via noising in latent space") that circumvents explicit reconstruction. Importantly, we contend that the limitations in prior reconstruction-based methods originate from the prevailing "detection via denoising in RGB space" paradigm. To address this, we model AD under a reconstruction-free formulation, which directly infers the final latent variable corresponding to the input image via DDIM inversion, and then measures the deviation based on the known prior distribution for anomaly scoring. Specifically, in approximating the original probability flow ODE using the Euler method, we enforce only a few inversion steps to noise the clean image to pursue inference efficiency. As the added noise is adaptively derived with the learned diffusion model, the original features for the clean testing image can still be leveraged to yield high detection accuracy. We perform extensive experiments and detailed analyses across four widely used industrial and medical AD benchmarks under the unsupervised unified setting to demonstrate the effectiveness of our model, achieving state-of-the-art AD performance and approximately 2x inference-time speedup without diffusion distillation.
