Table of Contents
Fetching ...

Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models

Tongkun Liu, Bing Li, Xiao Jin, Yupeng Shi, Qiuying Li, Xiang Wei

TL;DR

The paper addresses the gap in few-shot defect segmentation for general industrial scenarios by introducing a real-world object-based dataset and a comprehensive benchmark that includes textures, single-component, and multi-component products. It evaluates meta-learning and Vision Foundation Model (VFM) approaches, finding meta-learning to be generally unsuitable for cross-domain industrial FDS, while VFMs—especially SAM2 in video-track mode—and a newly proposed feature-matching method with knowledge distillation show strong potential. The proposed method achieves competitive accuracy with higher efficiency by using high-resolution feature representations and a tailored fusion with FastSAM, and SAM2’s video tracking further boosts performance on challenging defects. The work provides practical insights for deploying FDS in industry and lays groundwork for future improvements in dataset coverage and VFM-driven FDS, with publicly available code.

Abstract

Industrial defect segmentation is critical for manufacturing quality control. Due to the scarcity of training defect samples, few-shot semantic segmentation (FSS) holds significant value in this field. However, existing studies mostly apply FSS to tackle defects on simple textures, without considering more diverse scenarios. This paper aims to address this gap by exploring FSS in broader industrial products with various defect types. To this end, we contribute a new real-world dataset and reorganize some existing datasets to build a more comprehensive few-shot defect segmentation (FDS) benchmark. On this benchmark, we thoroughly investigate metric learning-based FSS methods, including those based on meta-learning and those based on Vision Foundation Models (VFMs). We observe that existing meta-learning-based methods are generally not well-suited for this task, while VFMs hold great potential. We further systematically study the applicability of various VFMs in this task, involving two paradigms: feature matching and the use of Segment Anything (SAM) models. We propose a novel efficient FDS method based on feature matching. Meanwhile, we find that SAM2 is particularly effective for addressing FDS through its video track mode. The contributed dataset and code will be available at: https://github.com/liutongkun/GFDS.

Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models

TL;DR

The paper addresses the gap in few-shot defect segmentation for general industrial scenarios by introducing a real-world object-based dataset and a comprehensive benchmark that includes textures, single-component, and multi-component products. It evaluates meta-learning and Vision Foundation Model (VFM) approaches, finding meta-learning to be generally unsuitable for cross-domain industrial FDS, while VFMs—especially SAM2 in video-track mode—and a newly proposed feature-matching method with knowledge distillation show strong potential. The proposed method achieves competitive accuracy with higher efficiency by using high-resolution feature representations and a tailored fusion with FastSAM, and SAM2’s video tracking further boosts performance on challenging defects. The work provides practical insights for deploying FDS in industry and lays groundwork for future improvements in dataset coverage and VFM-driven FDS, with publicly available code.

Abstract

Industrial defect segmentation is critical for manufacturing quality control. Due to the scarcity of training defect samples, few-shot semantic segmentation (FSS) holds significant value in this field. However, existing studies mostly apply FSS to tackle defects on simple textures, without considering more diverse scenarios. This paper aims to address this gap by exploring FSS in broader industrial products with various defect types. To this end, we contribute a new real-world dataset and reorganize some existing datasets to build a more comprehensive few-shot defect segmentation (FDS) benchmark. On this benchmark, we thoroughly investigate metric learning-based FSS methods, including those based on meta-learning and those based on Vision Foundation Models (VFMs). We observe that existing meta-learning-based methods are generally not well-suited for this task, while VFMs hold great potential. We further systematically study the applicability of various VFMs in this task, involving two paradigms: feature matching and the use of Segment Anything (SAM) models. We propose a novel efficient FDS method based on feature matching. Meanwhile, we find that SAM2 is particularly effective for addressing FDS through its video track mode. The contributed dataset and code will be available at: https://github.com/liutongkun/GFDS.

Paper Structure

This paper contains 29 sections, 18 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Comparison of existing few shot defect segmentation (FDS) research and ours. The red boxes indicate the segmentation targets. Current FDS research concentrates on textures while ours focuses on more general industrial scenarios. The left part lists four categories of texture defects from the benchmark bao2021triplet, which are visibly alike, appearing as white or black spots. This results in a high similarity between the base and test samples in meta-learning.
  • Figure 2: Examples of ambiguous defect category definitions from MVTec AD bergmann2019mvtec. The defects in the left part exhibit clear pattern differences, yet they are assigned to the same categories. Instead, the defects in the right part appear similar, but they are classified into different categories.
  • Figure 3: Our contributed dataset. The proposed dataset contains three types of rubber ring images: large rubber rings, small rubber rings, and side views of rubber rings, abbreviated as 'R_large', 'R_small', and 'R_side', respectively. They contain a total of nine types of defects.
  • Figure 4: Examples of different product defects selected from existing publicly available datasets.
  • Figure 5: The overview of the proposed feature matching-based FDS method. It primarily consists of three parts: 1. Feature distillation, 2. Feature matching, and 3. Refining the results with FastSAM.
  • ...and 4 more figures