Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models
Tongkun Liu, Bing Li, Xiao Jin, Yupeng Shi, Qiuying Li, Xiang Wei
TL;DR
The paper addresses the gap in few-shot defect segmentation for general industrial scenarios by introducing a real-world object-based dataset and a comprehensive benchmark that includes textures, single-component, and multi-component products. It evaluates meta-learning and Vision Foundation Model (VFM) approaches, finding meta-learning to be generally unsuitable for cross-domain industrial FDS, while VFMs—especially SAM2 in video-track mode—and a newly proposed feature-matching method with knowledge distillation show strong potential. The proposed method achieves competitive accuracy with higher efficiency by using high-resolution feature representations and a tailored fusion with FastSAM, and SAM2’s video tracking further boosts performance on challenging defects. The work provides practical insights for deploying FDS in industry and lays groundwork for future improvements in dataset coverage and VFM-driven FDS, with publicly available code.
Abstract
Industrial defect segmentation is critical for manufacturing quality control. Due to the scarcity of training defect samples, few-shot semantic segmentation (FSS) holds significant value in this field. However, existing studies mostly apply FSS to tackle defects on simple textures, without considering more diverse scenarios. This paper aims to address this gap by exploring FSS in broader industrial products with various defect types. To this end, we contribute a new real-world dataset and reorganize some existing datasets to build a more comprehensive few-shot defect segmentation (FDS) benchmark. On this benchmark, we thoroughly investigate metric learning-based FSS methods, including those based on meta-learning and those based on Vision Foundation Models (VFMs). We observe that existing meta-learning-based methods are generally not well-suited for this task, while VFMs hold great potential. We further systematically study the applicability of various VFMs in this task, involving two paradigms: feature matching and the use of Segment Anything (SAM) models. We propose a novel efficient FDS method based on feature matching. Meanwhile, we find that SAM2 is particularly effective for addressing FDS through its video track mode. The contributed dataset and code will be available at: https://github.com/liutongkun/GFDS.
