Table of Contents
Fetching ...

DeltaDeno: Zero-Shot Anomaly Generation via Delta-Denoising Attribution

Chaoran Xu, Chengkan Lv, Qiyu Chen, Yunkang Cao, Feng Zhang, Zhengtao Zhang

TL;DR

DeltaDeno tackles zero-shot anomaly generation in data-scarce settings by contrasting two synchronized diffusion branches driven by a minimal normal vs. anomaly prompt pair. It derives an image-specific anomaly mask from per-step denoising differences (delta-denoising attribution) and uses mask-guided latent inpainting to synthesize localized defects while preserving surrounding context. To improve stability and realism, it adds token-level prompt refinement and spatial attention biasing, all without any anomalous samples or model fine-tuning. Across industrial and unseen domains, DeltaDeno delivers sharper, more realistic anomalies and yields consistent gains in downstream anomaly detection, making it practical for rapid data bootstrapping and stress testing.

Abstract

Anomaly generation is often framed as few-shot fine-tuning with anomalous samples, which contradicts the scarcity that motivates generation and tends to overfit category priors. We tackle the setting where no real anomaly samples or training are available. We propose Delta-Denoising (DeltaDeno), a training-free zero-shot anomaly generation method that localizes and edits defects by contrasting two diffusion branches driven by a minimal prompt pair under a shared schedule. By accumulating per-step denoising deltas into an image-specific localization map, we obtain a mask to guide the latent inpainting during later diffusion steps and preserve the surrounding context while generating realistic local defects. To improve stability and control, DeltaDeno performs token-level prompt refinement that aligns shared content and strengthens anomaly tokens, and applies a spatial attention bias restricted to anomaly tokens in the predicted region. Experiments on public datasets show that DeltaDeno achieves great generation, realism and consistent gains in downstream detection performance. Code will be made publicly available.

DeltaDeno: Zero-Shot Anomaly Generation via Delta-Denoising Attribution

TL;DR

DeltaDeno tackles zero-shot anomaly generation in data-scarce settings by contrasting two synchronized diffusion branches driven by a minimal normal vs. anomaly prompt pair. It derives an image-specific anomaly mask from per-step denoising differences (delta-denoising attribution) and uses mask-guided latent inpainting to synthesize localized defects while preserving surrounding context. To improve stability and realism, it adds token-level prompt refinement and spatial attention biasing, all without any anomalous samples or model fine-tuning. Across industrial and unseen domains, DeltaDeno delivers sharper, more realistic anomalies and yields consistent gains in downstream anomaly detection, making it practical for rapid data bootstrapping and stress testing.

Abstract

Anomaly generation is often framed as few-shot fine-tuning with anomalous samples, which contradicts the scarcity that motivates generation and tends to overfit category priors. We tackle the setting where no real anomaly samples or training are available. We propose Delta-Denoising (DeltaDeno), a training-free zero-shot anomaly generation method that localizes and edits defects by contrasting two diffusion branches driven by a minimal prompt pair under a shared schedule. By accumulating per-step denoising deltas into an image-specific localization map, we obtain a mask to guide the latent inpainting during later diffusion steps and preserve the surrounding context while generating realistic local defects. To improve stability and control, DeltaDeno performs token-level prompt refinement that aligns shared content and strengthens anomaly tokens, and applies a spatial attention bias restricted to anomaly tokens in the predicted region. Experiments on public datasets show that DeltaDeno achieves great generation, realism and consistent gains in downstream detection performance. Code will be made publicly available.

Paper Structure

This paper contains 15 sections, 16 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison between visual anomaly generation methods. Compared with prior approaches, DeltaDeno delivers cross-category generalization, precise masks, and high realism while requiring no fine-tuning.
  • Figure 2: Overview of the DeltaDeno framework integrating delta-denoising localization, prompt refinement, and attention biasing.
  • Figure 3: Qualitative comparison with existing anomaly generation methods. Columns 1–4 are MVTec AD categories; the rightmost column presents unseen categories outside MVTec AD.
  • Figure 4: Ablation visualization of main modules. (a) DeltaDeno (full). (b) w/o $L_{\text{ctx}}$. (c) w/o spatial attention biasing. (d) w/o latent mask inpainting during late denoising stage.
  • Figure 5: Ablation visualization of the Anomaly Semantic Distillation module. Varying the description prompts steers the generated defect type and severity, yielding more controllable and diverse anomalies.