Table of Contents
Fetching ...

Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition

Yichun Tai, Kun Yang, Tao Peng, Zhenzhen Huang, Zhijiang Zhang

TL;DR

This work tackles the data scarcity problem in steel surface defect recognition by leveraging a diffusion prior via StableSDG. It introduces a two‑stage adaptation pipeline that aligns Stable Diffusion to defect distributions through token embedding optimization and low-rank network adaptation, followed by image‑oriented data generation that starts from partially perturbed real samples. The approach is guided by iterative quality evaluation using FID to produce high‑fidelity defect images, which are then used to train robust defect classifiers. Empirical results on NEU and CCBSD show superior generation fidelity and notable improvements in recognition accuracy compared to multiple baselines, highlighting the practical impact of diffusion priors for industrial data expansion.

Abstract

The task of steel surface defect recognition is an industrial problem with great industry values. The data insufficiency is the major challenge in training a robust defect recognition network. Existing methods have investigated to enlarge the dataset by generating samples with generative models. However, their generation quality is still limited by the insufficiency of defect image samples. To this end, we propose Stable Surface Defect Generation (StableSDG), which transfers the vast generation distribution embedded in Stable Diffusion model for steel surface defect image generation. To tackle with the distinctive distribution gap between steel surface images and generated images of the diffusion model, we propose two processes. First, we align the distribution by adapting parameters of the diffusion model, adopted both in the token embedding space and network parameter space. Besides, in the generation process, we propose image-oriented generation rather than from pure Gaussian noises. We conduct extensive experiments on steel surface defect dataset, demonstrating state-of-the-art performance on generating high-quality samples and training recognition models, and both designed processes are significant for the performance.

Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition

TL;DR

This work tackles the data scarcity problem in steel surface defect recognition by leveraging a diffusion prior via StableSDG. It introduces a two‑stage adaptation pipeline that aligns Stable Diffusion to defect distributions through token embedding optimization and low-rank network adaptation, followed by image‑oriented data generation that starts from partially perturbed real samples. The approach is guided by iterative quality evaluation using FID to produce high‑fidelity defect images, which are then used to train robust defect classifiers. Empirical results on NEU and CCBSD show superior generation fidelity and notable improvements in recognition accuracy compared to multiple baselines, highlighting the practical impact of diffusion priors for industrial data expansion.

Abstract

The task of steel surface defect recognition is an industrial problem with great industry values. The data insufficiency is the major challenge in training a robust defect recognition network. Existing methods have investigated to enlarge the dataset by generating samples with generative models. However, their generation quality is still limited by the insufficiency of defect image samples. To this end, we propose Stable Surface Defect Generation (StableSDG), which transfers the vast generation distribution embedded in Stable Diffusion model for steel surface defect image generation. To tackle with the distinctive distribution gap between steel surface images and generated images of the diffusion model, we propose two processes. First, we align the distribution by adapting parameters of the diffusion model, adopted both in the token embedding space and network parameter space. Besides, in the generation process, we propose image-oriented generation rather than from pure Gaussian noises. We conduct extensive experiments on steel surface defect dataset, demonstrating state-of-the-art performance on generating high-quality samples and training recognition models, and both designed processes are significant for the performance.
Paper Structure (20 sections, 7 equations, 10 figures, 8 tables, 1 algorithm)

This paper contains 20 sections, 7 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: The images generated by Stable Diffusion model rombach2022high with the prompt "steel surface defect".
  • Figure 2: The overall pipeline, including defect generation, quality evaluation, and defect recognition. The dotted arrow indicates the quality evaluation is iteratively conducted until achieves the optimal hyperparameters, which are then used to construct generated dataset.
  • Figure 3: Overview of StableSDG. In the process of generator adaptation (Section \ref{['generator adaptation']}), given the prompt $y$ (i.e., “A photo of$<$unknown$>$”) and the defect images $\mathbf{x}$ as input, we first optimize only the token embedding $v_d$ corresponding to the defect concept. Following this, we adapt the trainable matrices within the attention layers of both the text encoder and the U-net, using the previously optimized ${v}^{*}_d$. Next, we introduce image-oriented generation (Section \ref{['data generation']}) to produce defect samples, which are then utilized to train a defect recognition classifier (Section \ref{['defect recognition']}).
  • Figure 4: The illustration of defect categories in CCBSD.
  • Figure 5: Intermediate results of StableSDG for various defect categories.
  • ...and 5 more figures