Table of Contents
Fetching ...

Enhancing Multi-Class Anomaly Detection via Diffusion Refinement with Dual Conditioning

Jiawei Zhan, Jinxiang Lai, Bin-Bin Gao, Jun Liu, Xiaochen Chen, Chengjie Wang

TL;DR

This work tackles multi-class anomaly detection by addressing low spatial resolution and blurry reconstructions in unified models. It introduces Diffusion Refinement with Dual Conditioning (DRDC), combining a transformer-based base heatmap with a diffusion-refinement branch that performs inpainting on the original resolution and uses dual conditioning (model and test-time) to maintain category-awareness. A spatio-temporal fusion module then aggregates high-frequency refinements across timesteps, scales, and inpainting configurations to yield a final anomaly map. On benchmarks like MVTec-AD and BeanTechAD, DRDC achieves state-of-the-art performance, notably improving localization for fine-grained defects while maintaining efficient inference through selective timesteps and high-frequency reconstruction. The approach offers practical impact for industrial inspection by enabling accurate, category-aware, unified anomaly detection with improved localization accuracy and faster inference than full-diffusion sweeps.

Abstract

Anomaly detection, the technique of identifying abnormal samples using only normal samples, has attracted widespread interest in industry. Existing one-model-per-category methods often struggle with limited generalization capabilities due to their focus on a single category, and can fail when encountering variations in product. Recent feature reconstruction methods, as representatives in one-model-all-categories schemes, face challenges including reconstructing anomalous samples and blurry reconstructions. In this paper, we creatively combine a diffusion model and a transformer for multi-class anomaly detection. This approach leverages diffusion to obtain high-frequency information for refinement, greatly alleviating the blurry reconstruction problem while maintaining the sampling efficiency of the reverse diffusion process. The task is transformed into image inpainting to disconnect the input-output correlation, thereby mitigating the "identical shortcuts" problem and avoiding the model from reconstructing anomalous samples. Besides, we introduce category-awareness using dual conditioning to ensure the accuracy of prediction and reconstruction in the reverse diffusion process, preventing excessive deviation from the target category, thus effectively enabling multi-class anomaly detection. Futhermore, Spatio-temporal fusion is also employed to fuse heatmaps predicted at different timesteps and scales, enhancing the performance of multi-class anomaly detection. Extensive experiments on benchmark datasets demonstrate the superior performance and exceptional multi-class anomaly detection capabilities of our proposed method compared to others.

Enhancing Multi-Class Anomaly Detection via Diffusion Refinement with Dual Conditioning

TL;DR

This work tackles multi-class anomaly detection by addressing low spatial resolution and blurry reconstructions in unified models. It introduces Diffusion Refinement with Dual Conditioning (DRDC), combining a transformer-based base heatmap with a diffusion-refinement branch that performs inpainting on the original resolution and uses dual conditioning (model and test-time) to maintain category-awareness. A spatio-temporal fusion module then aggregates high-frequency refinements across timesteps, scales, and inpainting configurations to yield a final anomaly map. On benchmarks like MVTec-AD and BeanTechAD, DRDC achieves state-of-the-art performance, notably improving localization for fine-grained defects while maintaining efficient inference through selective timesteps and high-frequency reconstruction. The approach offers practical impact for industrial inspection by enabling accurate, category-aware, unified anomaly detection with improved localization accuracy and faster inference than full-diffusion sweeps.

Abstract

Anomaly detection, the technique of identifying abnormal samples using only normal samples, has attracted widespread interest in industry. Existing one-model-per-category methods often struggle with limited generalization capabilities due to their focus on a single category, and can fail when encountering variations in product. Recent feature reconstruction methods, as representatives in one-model-all-categories schemes, face challenges including reconstructing anomalous samples and blurry reconstructions. In this paper, we creatively combine a diffusion model and a transformer for multi-class anomaly detection. This approach leverages diffusion to obtain high-frequency information for refinement, greatly alleviating the blurry reconstruction problem while maintaining the sampling efficiency of the reverse diffusion process. The task is transformed into image inpainting to disconnect the input-output correlation, thereby mitigating the "identical shortcuts" problem and avoiding the model from reconstructing anomalous samples. Besides, we introduce category-awareness using dual conditioning to ensure the accuracy of prediction and reconstruction in the reverse diffusion process, preventing excessive deviation from the target category, thus effectively enabling multi-class anomaly detection. Futhermore, Spatio-temporal fusion is also employed to fuse heatmaps predicted at different timesteps and scales, enhancing the performance of multi-class anomaly detection. Extensive experiments on benchmark datasets demonstrate the superior performance and exceptional multi-class anomaly detection capabilities of our proposed method compared to others.
Paper Structure (17 sections, 17 equations, 3 figures, 5 tables)

This paper contains 17 sections, 17 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The overview of our proposed framework consists of the Base Model, Diffusion Refinement, and Spatio-temporal Heatmap Fusion. Samples are fed into both the base model and the diffusion refinement model. The base model produces a low-resolution base heatmap, while the diffusion refinement model generates high-resolution high-frequency correction heatmaps. Finally, spatio-temporal fusion is employed to obtain the final anomaly map.
  • Figure 2: Sensitivity analysis of various hyper-parameters of out DRDC method, including the number of timesteps $n_t$, the initial index of diffusion $t_{n_t}$, the number of disjoint sets $n_s$, the mean filter size $m$ and the hyper-parameter $\gamma$ of fusion.
  • Figure 3: Examples of our proposed framework. Each case from left to right is the original image, the heatmap of the base model, the heatmap of the diffusion refinement, the final synthesized heatmap with the original image, and the ground-truth mask, respectively. Note that for illustration purposes, we only select $\mathcal{H}_{\text{diff}}$ at timestep $t\!=\!50$ as an example, while the output will fuse all generated heatmaps.