AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis
Zhangyu Lai, Yilin Lu, Xinyang Li, Jianghang Lin, Yansong Qu, Liujuan Cao, Ming Li, Rongrong Ji
TL;DR
AnomalyPainter addresses the realism-diversity trade-off in industrial anomaly synthesis by uniting Vision-Language Large Models (VLLMs), Latent Diffusion Models (LDMs), and a professional texture library Tex-9K to generate zero-shot, realistic anomalies. The method builds Tex-9K (75 categories, 8,792 textures), uses VLLMs to generate object-specific anomaly descriptions, matches descriptions to textures via CLIP, and applies ControlNet with Texture-Aware Latent Init to inpaint normal images with adapted textures, ensuring smooth boundaries. Extensive experiments on VisA and MVTec AD show superior synthesis quality (IS/IL) and improved downstream detection/localization AUROCs compared with RealNet and AnoDiff, validating both realism and diversity gains. The work provides three core contributions: a scalable zero-shot anomaly synthesis pipeline, the Tex-9K texture library, and the Texture-Aware Latent Init mechanism that stabilizes industrial image inpainting for realistic defect synthesis.
Abstract
While existing anomaly synthesis methods have made remarkable progress, achieving both realism and diversity in synthesis remains a major obstacle. To address this, we propose AnomalyPainter, a zero-shot framework that breaks the diversity-realism trade-off dilemma through synergizing Vision Language Large Model (VLLM), Latent Diffusion Model (LDM), and our newly introduced texture library Tex-9K. Tex-9K is a professional texture library containing 75 categories and 8,792 texture assets crafted for diverse anomaly synthesis. Leveraging VLLM's general knowledge, reasonable anomaly text descriptions are generated for each industrial object and matched with relevant diverse textures from Tex-9K. These textures then guide the LDM via ControlNet to paint on normal images. Furthermore, we introduce Texture-Aware Latent Init to stabilize the natural-image-trained ControlNet for industrial images. Extensive experiments show that AnomalyPainter outperforms existing methods in realism, diversity, and generalization, achieving superior downstream performance.
