Enhancing Multi-Class Anomaly Detection via Diffusion Refinement with Dual Conditioning
Jiawei Zhan, Jinxiang Lai, Bin-Bin Gao, Jun Liu, Xiaochen Chen, Chengjie Wang
TL;DR
This work tackles multi-class anomaly detection by addressing low spatial resolution and blurry reconstructions in unified models. It introduces Diffusion Refinement with Dual Conditioning (DRDC), combining a transformer-based base heatmap with a diffusion-refinement branch that performs inpainting on the original resolution and uses dual conditioning (model and test-time) to maintain category-awareness. A spatio-temporal fusion module then aggregates high-frequency refinements across timesteps, scales, and inpainting configurations to yield a final anomaly map. On benchmarks like MVTec-AD and BeanTechAD, DRDC achieves state-of-the-art performance, notably improving localization for fine-grained defects while maintaining efficient inference through selective timesteps and high-frequency reconstruction. The approach offers practical impact for industrial inspection by enabling accurate, category-aware, unified anomaly detection with improved localization accuracy and faster inference than full-diffusion sweeps.
Abstract
Anomaly detection, the technique of identifying abnormal samples using only normal samples, has attracted widespread interest in industry. Existing one-model-per-category methods often struggle with limited generalization capabilities due to their focus on a single category, and can fail when encountering variations in product. Recent feature reconstruction methods, as representatives in one-model-all-categories schemes, face challenges including reconstructing anomalous samples and blurry reconstructions. In this paper, we creatively combine a diffusion model and a transformer for multi-class anomaly detection. This approach leverages diffusion to obtain high-frequency information for refinement, greatly alleviating the blurry reconstruction problem while maintaining the sampling efficiency of the reverse diffusion process. The task is transformed into image inpainting to disconnect the input-output correlation, thereby mitigating the "identical shortcuts" problem and avoiding the model from reconstructing anomalous samples. Besides, we introduce category-awareness using dual conditioning to ensure the accuracy of prediction and reconstruction in the reverse diffusion process, preventing excessive deviation from the target category, thus effectively enabling multi-class anomaly detection. Futhermore, Spatio-temporal fusion is also employed to fuse heatmaps predicted at different timesteps and scales, enhancing the performance of multi-class anomaly detection. Extensive experiments on benchmark datasets demonstrate the superior performance and exceptional multi-class anomaly detection capabilities of our proposed method compared to others.
