Table of Contents
Fetching ...

Controllable Image Synthesis of Industrial Data Using Stable Diffusion

Gabriele Valvano, Antonino Agostino, Giovanni De Magistris, Antonino Graziano, Giacomo Veneri

TL;DR

The paper addresses the challenge of scarce, hard-to-annotate industrial defect data by adapting a large pre-trained diffusion model to the domain. It introduces a two-phase method: first learn a new industrial concept with DreamBooth to inject a prior into Stable Diffusion, then learn a conditioning mechanism via HyperNetworks using topological drivers to enforce defect geometry and location. The generated self-annotated data are used to train downstream instance segmentation models, demonstrating improved crack detection and segmentation, even with limited real data. This approach offers a practical path to scalable, label-efficient industrial AI, reducing data collection and annotation costs while enabling robust production-use models.

Abstract

Training supervised deep neural networks that perform defect detection and segmentation requires large-scale fully-annotated datasets, which can be hard or even impossible to obtain in industrial environments. Generative AI offers opportunities to enlarge small industrial datasets artificially, thus enabling the usage of state-of-the-art supervised approaches in the industry. Unfortunately, also good generative models need a lot of data to train, while industrial datasets are often tiny. Here, we propose a new approach for reusing general-purpose pre-trained generative models on industrial data, ultimately allowing the generation of self-labelled defective images. First, we let the model learn the new concept, entailing the novel data distribution. Then, we force it to learn to condition the generative process, producing industrial images that satisfy well-defined topological characteristics and show defects with a given geometry and location. To highlight the advantage of our approach, we use the synthetic dataset to optimise a crack segmentor for a real industrial use case. When the available data is small, we observe considerable performance increase under several metrics, showing the method's potential in production environments.

Controllable Image Synthesis of Industrial Data Using Stable Diffusion

TL;DR

The paper addresses the challenge of scarce, hard-to-annotate industrial defect data by adapting a large pre-trained diffusion model to the domain. It introduces a two-phase method: first learn a new industrial concept with DreamBooth to inject a prior into Stable Diffusion, then learn a conditioning mechanism via HyperNetworks using topological drivers to enforce defect geometry and location. The generated self-annotated data are used to train downstream instance segmentation models, demonstrating improved crack detection and segmentation, even with limited real data. This approach offers a practical path to scalable, label-efficient industrial AI, reducing data collection and annotation costs while enabling robust production-use models.

Abstract

Training supervised deep neural networks that perform defect detection and segmentation requires large-scale fully-annotated datasets, which can be hard or even impossible to obtain in industrial environments. Generative AI offers opportunities to enlarge small industrial datasets artificially, thus enabling the usage of state-of-the-art supervised approaches in the industry. Unfortunately, also good generative models need a lot of data to train, while industrial datasets are often tiny. Here, we propose a new approach for reusing general-purpose pre-trained generative models on industrial data, ultimately allowing the generation of self-labelled defective images. First, we let the model learn the new concept, entailing the novel data distribution. Then, we force it to learn to condition the generative process, producing industrial images that satisfy well-defined topological characteristics and show defects with a given geometry and location. To highlight the advantage of our approach, we use the synthetic dataset to optimise a crack segmentor for a real industrial use case. When the available data is small, we observe considerable performance increase under several metrics, showing the method's potential in production environments.
Paper Structure (22 sections, 8 equations, 6 figures, 4 tables)

This paper contains 22 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Method overview. Our approach involves two main steps: i) learn the concept and ii) learn the condition. For the first step, we utilise a high-capacity model that has previously learned a general-purpose image prior. For simplicity, we use Stable Diffusion, which we inject with the knowledge of a new, previously unseen concept. In the second phase, we enforce label-driven constraints to ensure the generation process adherence to specific criteria. Finally, with the resulting conditional generator, we can produce self-annotated data that we can leverage to optimise other models to perform supervised downstream tasks, such as crack detection and segmentation.
  • Figure 2: Generic prior vs real industrial images. The image prior learned by pre-trained foundation models, such as Stable Diffusion, do not match actual industrial data. On the left, we show four images generated by Stable Diffusion with the input prompt: "A borescope image of the inside of a combustor chamber". On the right, we report four real industrial images, for comparison. We suggest to overcome this change in distribution as described in \ref{['subsec:learn_the_concept']}.
  • Figure 3: Conditioning pre-trained unconditional diffusion models with HyperNetworks. The HyperNetwork ( yellow box) influences the image generator (cyan box) to output an image satisfying the input conditioning factors. As conditioning mechanism we use coarse geometrical drivers and a defect mask. Furthermore, we condition the diffusion model by slightly altering the input prompt from "an image of a [V]" to "an image of a [V] with a [class] crack". For each condition, we replace [class] with the name of the category shown inside the defect mask, further highlighting crack-related information.
  • Figure 4: Model Overview. Top: after learning the new concept (\ref{['subsec:learn_the_condition']}) a HyperNetwork ( yellow box) uses an input topological driver to condition the image generation of a pre-trained diffusion model (cyan box). With the given condition, the model produces an image matching the topological driver and showing a defect corresponding to the defect mask inside the driver. Bottom: extracting the topological driver from an input image. In the first part of the process, we do a lossy compression of the image colours. Then, we add the defect mask on the resulting image, ensuring that the driver carries out accurate pixel-level information of the defect topology and class. The resulting image shows coarse information about the image topology and a segmentation mask of the defect that the generative model must reproduce.
  • Figure 5: Topological drivers. Top row: at training, we extract the driver from defective images and add a real defect segmentation mask on top of it. Bottom row: at inference, we extract the driver from defect-free images and add an arbitrary segmentation mask on the image. Notice that simply looking at the final result, it is impossible to say if the topological map with the defect mask comes from a defective or a defect-free image. Hence, a model trained using only defective data will be biased to synthesise a defective image also for the input conditioning in the bottom row.
  • ...and 1 more figures