Table of Contents
Fetching ...

One-to-More: High-Fidelity Training-Free Anomaly Generation with Attention Control

Haoxiang Rao, Zhao Wang, Chenyang Si, Yan Lyu, Yuanyi Duan, Fang Zhao, Caifeng Shan

Abstract

Industrial anomaly detection (AD) is characterized by an abundance of normal images but a scarcity of anomalous ones. Although numerous few-shot anomaly synthesis methods have been proposed to augment anomalous data for downstream AD tasks, most existing approaches require time-consuming training and struggle to learn distributions that are faithful to real anomalies, thereby restricting the efficacy of AD models trained on such data. To address these limitations, we propose a training-free few-shot anomaly generation method, namely O2MAG, which leverages the self-attention in One reference anomalous image to synthesize More realistic anomalies, supporting effective downstream anomaly detection. Specifically, O2MAG manipulates three parallel diffusion processes via self-attention grafting and incorporates the anomaly mask to mitigate foreground-background query confusion, synthesizing text-guided anomalies that closely adhere to real anomalous distributions. To bridge the semantic gap between the encoded anomaly text prompts and the true anomaly semantics, Anomaly-Guided Optimization is further introduced to align the synthesis process with the target anomalous distribution, steering the generation toward realistic and text-consistent anomalies. Moreover, to mitigate faint anomaly synthesis inside anomaly masks, Dual-Attention Enhancement is adopted during generation to reinforce both self- and cross-attention on masked regions. Extensive experiments validate the effectiveness of O2MAG, demonstrating its superior performance over prior state-of-the-art methods on downstream AD tasks.

One-to-More: High-Fidelity Training-Free Anomaly Generation with Attention Control

Abstract

Industrial anomaly detection (AD) is characterized by an abundance of normal images but a scarcity of anomalous ones. Although numerous few-shot anomaly synthesis methods have been proposed to augment anomalous data for downstream AD tasks, most existing approaches require time-consuming training and struggle to learn distributions that are faithful to real anomalies, thereby restricting the efficacy of AD models trained on such data. To address these limitations, we propose a training-free few-shot anomaly generation method, namely O2MAG, which leverages the self-attention in One reference anomalous image to synthesize More realistic anomalies, supporting effective downstream anomaly detection. Specifically, O2MAG manipulates three parallel diffusion processes via self-attention grafting and incorporates the anomaly mask to mitigate foreground-background query confusion, synthesizing text-guided anomalies that closely adhere to real anomalous distributions. To bridge the semantic gap between the encoded anomaly text prompts and the true anomaly semantics, Anomaly-Guided Optimization is further introduced to align the synthesis process with the target anomalous distribution, steering the generation toward realistic and text-consistent anomalies. Moreover, to mitigate faint anomaly synthesis inside anomaly masks, Dual-Attention Enhancement is adopted during generation to reinforce both self- and cross-attention on masked regions. Extensive experiments validate the effectiveness of O2MAG, demonstrating its superior performance over prior state-of-the-art methods on downstream AD tasks.
Paper Structure (31 sections, 8 equations, 31 figures, 17 tables)

This paper contains 31 sections, 8 equations, 31 figures, 17 tables.

Figures (31)

  • Figure 1: Left: Comparison of diffusion-based anomaly generation. Training-based approaches either (i) add defect block to learn the anomaly distribution or (ii) train embeddings by textual-inversion to mimic anomalous visual styles; whereas, (iii) existing training-free method, AnomalyAny, fails to express precise and realistic anomaly semantics while (iv) our proposed training-free O2MAG delivers background-faithful synthesis with diverse, localized anomalies. Right: Comparison of training-free methods between AnomalyAny and our O2MAG. In AnomalyAny, the glue-on-leather and wire-in-cable deviate from the real defect distribution.
  • Figure 2: Overview of the proposed O2MAG. Our method synthesize anomalies by coordinating self-attention in three parallel diffusion process. DAE operates on the target branch’s self-/cross-attention to enforce full-mask filling, while AGO refines the anomaly text embedding to align semantics with reference anomaly image to make generation more realistic.
  • Figure 3: (a) The intermediate reconstruction during the iterative denoising process, and (b) visualization of the self-attention map A at the 30th sampling step. Principal components preserve the image layout and regions rendered with the same color share semantic attributes.
  • Figure 4: Anomaly-Guided Optimization (AGO) pipeline. Numbers indicate different synthesized samples from $\boldsymbol{Z}_T^{ref}$ using the current optimized prompt embedding.
  • Figure 5: Qualitative comparison of generated results on MVTec-AD. The sub-image in the lower right corner is anomaly mask.
  • ...and 26 more figures