Controllable Image Synthesis of Industrial Data Using Stable Diffusion
Gabriele Valvano, Antonino Agostino, Giovanni De Magistris, Antonino Graziano, Giacomo Veneri
TL;DR
The paper addresses the challenge of scarce, hard-to-annotate industrial defect data by adapting a large pre-trained diffusion model to the domain. It introduces a two-phase method: first learn a new industrial concept with DreamBooth to inject a prior into Stable Diffusion, then learn a conditioning mechanism via HyperNetworks using topological drivers to enforce defect geometry and location. The generated self-annotated data are used to train downstream instance segmentation models, demonstrating improved crack detection and segmentation, even with limited real data. This approach offers a practical path to scalable, label-efficient industrial AI, reducing data collection and annotation costs while enabling robust production-use models.
Abstract
Training supervised deep neural networks that perform defect detection and segmentation requires large-scale fully-annotated datasets, which can be hard or even impossible to obtain in industrial environments. Generative AI offers opportunities to enlarge small industrial datasets artificially, thus enabling the usage of state-of-the-art supervised approaches in the industry. Unfortunately, also good generative models need a lot of data to train, while industrial datasets are often tiny. Here, we propose a new approach for reusing general-purpose pre-trained generative models on industrial data, ultimately allowing the generation of self-labelled defective images. First, we let the model learn the new concept, entailing the novel data distribution. Then, we force it to learn to condition the generative process, producing industrial images that satisfy well-defined topological characteristics and show defects with a given geometry and location. To highlight the advantage of our approach, we use the synthetic dataset to optimise a crack segmentor for a real industrial use case. When the available data is small, we observe considerable performance increase under several metrics, showing the method's potential in production environments.
