Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
Mingyu Lee, Jongwon Choi
TL;DR
The paper tackles data scarcity in industrial anomaly detection by introducing a text-guided variational image generation framework that synthesizes non-defective images aligned with textual and visual priors. It combines a keyword-to-prompt generator, a variance-aware extension of VQGAN, and a text-guided knowledge integrator to produce diverse, status-consistent non-defective data that preserve variance. Across MVTECAD, BTAD, and MVTEC-LOCO AD, the approach yields substantial improvements in detection and segmentation, especially in one-shot and few-shot settings, and generalizes across multiple baselines. The work highlights the importance of modeling latent variance and semantic alignment in data augmentation for robust anomaly detection with limited real non-defective data, offering a practical path for industrial deployments.
Abstract
We propose a text-guided variational image generation method to address the challenge of getting clean data for anomaly detection in industrial manufacturing. Our method utilizes text information about the target object, learned from extensive text library documents, to generate non-defective data images resembling the input image. The proposed framework ensures that the generated non-defective images align with anticipated distributions derived from textual and image-based knowledge, ensuring stability and generality. Experimental results demonstrate the effectiveness of our approach, surpassing previous methods even with limited non-defective data. Our approach is validated through generalization tests across four baseline models and three distinct datasets. We present an additional analysis to enhance the effectiveness of anomaly detection models by utilizing the generated images.
