Table of Contents
Fetching ...

AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis

Zhangyu Lai, Yilin Lu, Xinyang Li, Jianghang Lin, Yansong Qu, Liujuan Cao, Ming Li, Rongrong Ji

TL;DR

AnomalyPainter addresses the realism-diversity trade-off in industrial anomaly synthesis by uniting Vision-Language Large Models (VLLMs), Latent Diffusion Models (LDMs), and a professional texture library Tex-9K to generate zero-shot, realistic anomalies. The method builds Tex-9K (75 categories, 8,792 textures), uses VLLMs to generate object-specific anomaly descriptions, matches descriptions to textures via CLIP, and applies ControlNet with Texture-Aware Latent Init to inpaint normal images with adapted textures, ensuring smooth boundaries. Extensive experiments on VisA and MVTec AD show superior synthesis quality (IS/IL) and improved downstream detection/localization AUROCs compared with RealNet and AnoDiff, validating both realism and diversity gains. The work provides three core contributions: a scalable zero-shot anomaly synthesis pipeline, the Tex-9K texture library, and the Texture-Aware Latent Init mechanism that stabilizes industrial image inpainting for realistic defect synthesis.

Abstract

While existing anomaly synthesis methods have made remarkable progress, achieving both realism and diversity in synthesis remains a major obstacle. To address this, we propose AnomalyPainter, a zero-shot framework that breaks the diversity-realism trade-off dilemma through synergizing Vision Language Large Model (VLLM), Latent Diffusion Model (LDM), and our newly introduced texture library Tex-9K. Tex-9K is a professional texture library containing 75 categories and 8,792 texture assets crafted for diverse anomaly synthesis. Leveraging VLLM's general knowledge, reasonable anomaly text descriptions are generated for each industrial object and matched with relevant diverse textures from Tex-9K. These textures then guide the LDM via ControlNet to paint on normal images. Furthermore, we introduce Texture-Aware Latent Init to stabilize the natural-image-trained ControlNet for industrial images. Extensive experiments show that AnomalyPainter outperforms existing methods in realism, diversity, and generalization, achieving superior downstream performance.

AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis

TL;DR

AnomalyPainter addresses the realism-diversity trade-off in industrial anomaly synthesis by uniting Vision-Language Large Models (VLLMs), Latent Diffusion Models (LDMs), and a professional texture library Tex-9K to generate zero-shot, realistic anomalies. The method builds Tex-9K (75 categories, 8,792 textures), uses VLLMs to generate object-specific anomaly descriptions, matches descriptions to textures via CLIP, and applies ControlNet with Texture-Aware Latent Init to inpaint normal images with adapted textures, ensuring smooth boundaries. Extensive experiments on VisA and MVTec AD show superior synthesis quality (IS/IL) and improved downstream detection/localization AUROCs compared with RealNet and AnoDiff, validating both realism and diversity gains. The work provides three core contributions: a scalable zero-shot anomaly synthesis pipeline, the Tex-9K texture library, and the Texture-Aware Latent Init mechanism that stabilizes industrial image inpainting for realistic defect synthesis.

Abstract

While existing anomaly synthesis methods have made remarkable progress, achieving both realism and diversity in synthesis remains a major obstacle. To address this, we propose AnomalyPainter, a zero-shot framework that breaks the diversity-realism trade-off dilemma through synergizing Vision Language Large Model (VLLM), Latent Diffusion Model (LDM), and our newly introduced texture library Tex-9K. Tex-9K is a professional texture library containing 75 categories and 8,792 texture assets crafted for diverse anomaly synthesis. Leveraging VLLM's general knowledge, reasonable anomaly text descriptions are generated for each industrial object and matched with relevant diverse textures from Tex-9K. These textures then guide the LDM via ControlNet to paint on normal images. Furthermore, we introduce Texture-Aware Latent Init to stabilize the natural-image-trained ControlNet for industrial images. Extensive experiments show that AnomalyPainter outperforms existing methods in realism, diversity, and generalization, achieving superior downstream performance.

Paper Structure

This paper contains 21 sections, 1 equation, 12 figures, 8 tables, 2 algorithms.

Figures (12)

  • Figure 1: The blue hypersphere liu2024deep represents the normal sample distribution, realistic anomaly distribution should be close to it, while unrealistic anomaly distribution should be farther. Anomaly samples synthesized by different methods exhibit different distributions.
  • Figure 2: Overview of AnomalyPainter. Our framework synthesizes diverse and realistic anomaly samples through three main steps: Middle:Professional Texture Library Construction constructs Tex-9K, a texture library with 8,792 texture assets, designed to provide diverse textures crafted for anomaly synthesis. Left:Anomaly Description Generation and Matching utilizes VLLM to generate reasonable anomaly descriptions for each industrial object and matches them with relevant textures from Tex-9K using cosine similarity. Right:Adaptive Texture Anomaly Generation utilizes Texture-Aware Latent Init to stabilize ControlNet’s edge-mask control for LDM's high-realism inpainting, ensuring the seamless integration of relevant textures into normal industrial object images.
  • Figure 3: Texture-Aware Latent Initialization (TALI) blends normal image latent $z^{N}$ and adaptive texture latent $z^{\wp}$ at a later timestep $T^*$ to get $z_{T^*}^{\text{res}}$ as the starting point for better denoising result. For better clarity, the images are shown in the pixel space instead of the latent space.
  • Figure 4: Left: An example of a successfully generated inpainting mask $M_{\text{in}}$ and its corresponding adaptive texture image $x_{\wp}$. Right: An example of the generated anomaly result and the refined mask.
  • Figure 5: Qualitative Comparison. It is clear that our method can generate diverse and realistic anomaly data across various industrial objects in multiple datasets. It outperforms both the best few-shot method, AnoDiff, and the best zero-shot method, RealNet.
  • ...and 7 more figures