Table of Contents
Fetching ...

DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection

Jaewoo Song, Daemin Park, Kanghyun Baek, Sangyub Lee, Jooyoung Choi, Eunji Kim, Sungroh Yoon

TL;DR

DefectFill tackles the data-scarcity challenge in visual inspection by learning realistic defect concepts from few reference image-mask pairs using a fine-tuned inpainting diffusion model. It introduces three defect-focused losses ($\mathcal{L}_{def}$, $\mathcal{L}_{obj}$, $\mathcal{L}_{attn}$) and a DefectFill objective, enabling precise, context-aware defect synthesis; Low-Fidelity Selection further filters high-quality samples. Empirical results on the MVTec AD dataset show state-of-the-art generation quality (KID and IC-LPIPS) and improved downstream tasks such as anomaly classification and localization when trained on synthesized defects. The approach demonstrates strong realism and transferability, making it especially suitable for industrial settings where defect data are scarce, though global-defect cases remain challenging.

Abstract

Developing effective visual inspection models remains challenging due to the scarcity of defect data. While image generation models have been used to synthesize defect images, producing highly realistic defects remains difficult. We propose DefectFill, a novel method for realistic defect generation that requires only a few reference defect images. It leverages a fine-tuned inpainting diffusion model, optimized with our custom loss functions incorporating defect, object, and attention terms. It enables precise capture of detailed, localized defect features and their seamless integration into defect-free objects. Additionally, our Low-Fidelity Selection method further enhances the defect sample quality. Experiments show that DefectFill generates high-quality defect images, enabling visual inspection models to achieve state-of-the-art performance on the MVTec AD dataset.

DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection

TL;DR

DefectFill tackles the data-scarcity challenge in visual inspection by learning realistic defect concepts from few reference image-mask pairs using a fine-tuned inpainting diffusion model. It introduces three defect-focused losses (, , ) and a DefectFill objective, enabling precise, context-aware defect synthesis; Low-Fidelity Selection further filters high-quality samples. Empirical results on the MVTec AD dataset show state-of-the-art generation quality (KID and IC-LPIPS) and improved downstream tasks such as anomaly classification and localization when trained on synthesized defects. The approach demonstrates strong realism and transferability, making it especially suitable for industrial settings where defect data are scarce, though global-defect cases remain challenging.

Abstract

Developing effective visual inspection models remains challenging due to the scarcity of defect data. While image generation models have been used to synthesize defect images, producing highly realistic defects remains difficult. We propose DefectFill, a novel method for realistic defect generation that requires only a few reference defect images. It leverages a fine-tuned inpainting diffusion model, optimized with our custom loss functions incorporating defect, object, and attention terms. It enables precise capture of detailed, localized defect features and their seamless integration into defect-free objects. Additionally, our Low-Fidelity Selection method further enhances the defect sample quality. Experiments show that DefectFill generates high-quality defect images, enabling visual inspection models to achieve state-of-the-art performance on the MVTec AD dataset.

Paper Structure

This paper contains 46 sections, 9 equations, 26 figures, 7 tables.

Figures (26)

  • Figure 1: Given a few reference image-mask pairs of a defect (e.g."hole" of a hazelnut), DefectFill learns the defect and realistically fill it onto defect-free objects in desired shapes (e.g. star, square, etc.), generating new defect images that integrate naturally with the objects. These generated images are then used for visual inspection tasks.
  • Figure 2: Defect learning overview. To fine-tune the inpainting diffusion model, we compute three types of loss ($\mathcal{L}_{def}$, $\mathcal{L}_{attn}$, and $\mathcal{L}_{obj}$) using an image $I$ and a defect mask $M$. The image $I$ is duplicated, with each copy combined with different masks ($M$ and $M_{rand}$) and prompts ($\mathcal{P}_{def}$: "A photo of [$V^*$]", and $\mathcal{P}_{obj}$: "A hazelnut with [$V^*$]") as inputs to the model. The model prediction using the defect prompt $\mathcal{P}_{def}$ (upper pipeline) is used to compute $\mathcal{L}_{def}$ and, while the prediction using the object prompt $\mathcal{P}_{obj}$ (lower pipeline) is used to compute $\mathcal{L}_{attn}$ and $\mathcal{L}_{obj}$.
  • Figure 3: Low-Fidelity Selection (LFS) for defect of leather's glue. LFS automatically selects the defect image with the most pronounced expression (blue box) by identifying the sample with the lowest fidelity (highest LPIPS score) in the masked area.
  • Figure 4: Generated Defects by DefectFill. The first row displays the normal images (green boxes), while the second row shows the generated defect images along with their masks, and the third row provides zoomed-in views of the defects (red boxes). The zoomed images highlight the realistic and detailed rendering of the defects.
  • Figure 5: Defect Generation Comparisons. This figure compares the quality of defect images generated by our method (bottom row) with baseline approaches. Our method produces the most realistic results, with defects that blend seamlessly into the objects.
  • ...and 21 more figures