Table of Contents
Fetching ...

Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics

Shuai Yang, Zhifei Chen, Pengguang Chen, Xi Fang, Yixun Liang, Shu Liu, Yingcong Chen

TL;DR

The paper introduces Defect Spectrum, a large, semantically rich industrial defect dataset built on four benchmarks to provide precise, multi-class defect annotations and descriptive captions, addressing the lack of granularity in existing datasets. To overcome data scarcity, it proposes Defect-Gen, a two-stage diffusion-based generator that models global structure with a large receptive field and local detail with a small receptive field, using patch-level processing and a custom auxiliary tool, Defect-Click, to accelerate precise annotation. The authors demonstrate improved defect segmentation performance and substantial gains in downstream tasks when incorporating synthetic data, while also delivering a comprehensive set of visual and quantitative analyses across multiple datasets. This work advances practical defect inspection by enabling finer-grained analysis, richer semantics, and more robust model training in low-data industrial settings.

Abstract

Defect inspection is paramount within the closed-loop manufacturing system. However, existing datasets for defect inspection often lack precision and semantic granularity required for practical applications. In this paper, we introduce the Defect Spectrum, a comprehensive benchmark that offers precise, semantic-abundant, and large-scale annotations for a wide range of industrial defects. Building on four key industrial benchmarks, our dataset refines existing annotations and introduces rich semantic details, distinguishing multiple defect types within a single image. Furthermore, we introduce Defect-Gen, a two-stage diffusion-based generator designed to create high-quality and diverse defective images, even when working with limited datasets. The synthetic images generated by Defect-Gen significantly enhance the efficacy of defect inspection models. Overall, The Defect Spectrum dataset demonstrates its potential in defect inspection research, offering a solid platform for testing and refining advanced models.

Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics

TL;DR

The paper introduces Defect Spectrum, a large, semantically rich industrial defect dataset built on four benchmarks to provide precise, multi-class defect annotations and descriptive captions, addressing the lack of granularity in existing datasets. To overcome data scarcity, it proposes Defect-Gen, a two-stage diffusion-based generator that models global structure with a large receptive field and local detail with a small receptive field, using patch-level processing and a custom auxiliary tool, Defect-Click, to accelerate precise annotation. The authors demonstrate improved defect segmentation performance and substantial gains in downstream tasks when incorporating synthetic data, while also delivering a comprehensive set of visual and quantitative analyses across multiple datasets. This work advances practical defect inspection by enabling finer-grained analysis, richer semantics, and more robust model training in low-data industrial settings.

Abstract

Defect inspection is paramount within the closed-loop manufacturing system. However, existing datasets for defect inspection often lack precision and semantic granularity required for practical applications. In this paper, we introduce the Defect Spectrum, a comprehensive benchmark that offers precise, semantic-abundant, and large-scale annotations for a wide range of industrial defects. Building on four key industrial benchmarks, our dataset refines existing annotations and introduces rich semantic details, distinguishing multiple defect types within a single image. Furthermore, we introduce Defect-Gen, a two-stage diffusion-based generator designed to create high-quality and diverse defective images, even when working with limited datasets. The synthetic images generated by Defect-Gen significantly enhance the efficacy of defect inspection models. Overall, The Defect Spectrum dataset demonstrates its potential in defect inspection research, offering a solid platform for testing and refining advanced models.
Paper Structure (33 sections, 24 figures, 11 tables)

This paper contains 33 sections, 24 figures, 11 tables.

Figures (24)

  • Figure 1: (a) Identifying the size, position, and type of defects is essential for quality control, as it guides the post-processing of products. Major issues, such as misaligned zipper teeth, necessitate factory rework, whereas minor problems, like fabric snags, can lead to different distribution strategies. This approach ensures the maintenance of product quality and enhances the distribution process. (b) Shows our annotation is finer, and includes those that are omitted in the source annotation. (c) Source annotationbergmann2019mvtecbai2023visionwieler2007weakly ignores multiple defective classes within a single image, while ours provides annotation for each distinct class, shown in different colors. Best viewed in color.
  • Figure 1: The annotation comparison of the "cable" and "capsule" class in MVTec dataset. The first row shows the defect image. Rows 2 and 3 show the original annotation and our improved annotation. Best viewed in color.
  • Figure 2: DDPM predicts high density around training samples and fails to capture the true data distribution.
  • Figure 2: The annotation comparison of the "toothbrush" and "hazelnut" class in MVTec dataset. The first row shows the defect image. Rows 2 and 3 show the original annotation and our improved annotation. Best viewed in color.
  • Figure 3: The inference process of the two staged diffusion models. The input to the large model $p_{\theta}$ is gaussian noise, after the optimal step is reached, the intermediate results containing global information will be used as the input to the small model $p_{\phi}$.
  • ...and 19 more figures