Table of Contents
Fetching ...

Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation

Ying Jin, Jinlong Peng, Qingdong He, Teng Hu, Jiafu Wu, Hao Chen, Haoxuan Wang, Wenbing Zhu, Mingmin Chi, Jun Liu, Yabiao Wang

TL;DR

Scarcity of anomaly data hampers industrial inspection tasks. The authors introduce DualAnoDiff, a dual-interrelated diffusion framework that simultaneously generates full anomaly images and their precise masks, coupled with a Background Compensation Module to preserve background integrity. By sharing information via a Self-attention Interaction Module and using LoRA-based fine-tuning, the model achieves highly realistic, well-aligned anomaly-image pairs and superior diversity. Experiments on MVTec AD show state-of-the-art performance in pixel-level anomaly localization, detection, and classification when trained on the generated data, demonstrating strong practical impact for data-efficient anomaly inspection systems.

Abstract

The performance of anomaly inspection in industrial manufacturing is constrained by the scarcity of anomaly data. To overcome this challenge, researchers have started employing anomaly generation approaches to augment the anomaly dataset. However, existing anomaly generation methods suffer from limited diversity in the generated anomalies and struggle to achieve a seamless blending of this anomaly with the original image. Moreover, the generated mask is usually not aligned with the generated anomaly. In this paper, we overcome these challenges from a new perspective, simultaneously generating a pair of the overall image and the corresponding anomaly part. We propose DualAnoDiff, a novel diffusion-based few-shot anomaly image generation model, which can generate diverse and realistic anomaly images by using a dual-interrelated diffusion model, where one of them is employed to generate the whole image while the other one generates the anomaly part. Moreover, we extract background and shape information to mitigate the distortion and blurriness phenomenon in few-shot image generation. Extensive experiments demonstrate the superiority of our proposed model over state-of-the-art methods in terms of diversity, realism and the accuracy of mask. Overall, our approach significantly improves the performance of downstream anomaly inspection tasks, including anomaly detection, anomaly localization, and anomaly classification tasks.

Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation

TL;DR

Scarcity of anomaly data hampers industrial inspection tasks. The authors introduce DualAnoDiff, a dual-interrelated diffusion framework that simultaneously generates full anomaly images and their precise masks, coupled with a Background Compensation Module to preserve background integrity. By sharing information via a Self-attention Interaction Module and using LoRA-based fine-tuning, the model achieves highly realistic, well-aligned anomaly-image pairs and superior diversity. Experiments on MVTec AD show state-of-the-art performance in pixel-level anomaly localization, detection, and classification when trained on the generated data, demonstrating strong practical impact for data-efficient anomaly inspection systems.

Abstract

The performance of anomaly inspection in industrial manufacturing is constrained by the scarcity of anomaly data. To overcome this challenge, researchers have started employing anomaly generation approaches to augment the anomaly dataset. However, existing anomaly generation methods suffer from limited diversity in the generated anomalies and struggle to achieve a seamless blending of this anomaly with the original image. Moreover, the generated mask is usually not aligned with the generated anomaly. In this paper, we overcome these challenges from a new perspective, simultaneously generating a pair of the overall image and the corresponding anomaly part. We propose DualAnoDiff, a novel diffusion-based few-shot anomaly image generation model, which can generate diverse and realistic anomaly images by using a dual-interrelated diffusion model, where one of them is employed to generate the whole image while the other one generates the anomaly part. Moreover, we extract background and shape information to mitigate the distortion and blurriness phenomenon in few-shot image generation. Extensive experiments demonstrate the superiority of our proposed model over state-of-the-art methods in terms of diversity, realism and the accuracy of mask. Overall, our approach significantly improves the performance of downstream anomaly inspection tasks, including anomaly detection, anomaly localization, and anomaly classification tasks.
Paper Structure (25 sections, 6 equations, 14 figures, 7 tables)

This paper contains 25 sections, 6 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: Top: Evaluating anomaly generation quality in four aspects: Whether generate valid anomaly, degree of realism, alignment of the mask, and whether the location of mask is reasonable, the results show that our generated results are better than the other methods. (Yellow area represents the valid generation, green indicates the correct mask or area, red indicates the generated mask or wrong area.) Bottom: Our model can simultaneously generate extensive anomaly image-mask pairs.
  • Figure 2: The architecture of DualAnoDiff. 1) Two branches of DualAnoDiff generate the anomaly image and corresponding anomaly part simultaneously with different but nested prompts. 2) Two branches share the attention information after every attention block by Self-Attention Interaction Module (SAIM) during the denoising process to keep the consistency of generated images. 3) Background Compensation Module (BCM) extracts the Key, and Value of the background image and applies an adaptive fusion to SD, to help the model more focus on the object of the image.
  • Figure 3: a is the image generated by SD, b and c are the cross attention maps of different text tokens in SD corresponding to the text of "a vfx with" and "sks".
  • Figure 4: Comparison between the models without (a) and with (b) the Background Compensation Module.
  • Figure 5: Comparison of the generation results on MVTec. Our model excels in generating high-quality anomaly images that are accurately aligned with the anomaly masks.
  • ...and 9 more figures