Table of Contents
Fetching ...

LERENet: Eliminating Intra-class Differences for Metal Surface Defect Few-shot Semantic Segmentation

Hanze Ding, Zhangkai Wu, Jiyan Zhang, Ming Ping, Yanfang Liu

TL;DR

This work addresses the challenge of few-shot semantic segmentation for metal surface defects, where intra-class variations—semantic and distortion differences—undermine cross-sample guidance. It introduces LERENet, a dual-view framework consisting of Multi-Prototype Reasoning (MPR) operating on local descriptors within a graph space, and Multi-Prototype Excitation (MPE) that leverages global-edge information in the feature space, fused by an Information Fusion Module (IFM) to produce precise pixel-level masks. Two intra-class differences are defined and addressed explicitly, with experiments on the Surface Defect-4^i dataset demonstrating state-of-the-art performance in 1-shot and 5-shot scenarios, supported by ablations showing complementary benefits of MPR and MPE. The approach offers a data-efficient solution with strong practical impact for industrial defect inspection where labeling is expensive and defects vary across processes and imaging conditions.

Abstract

Few-shot segmentation models excel in metal defect detection due to their rapid generalization ability to new classes and pixel-level segmentation, rendering them ideal for addressing data scarcity issues and achieving refined object delineation in industrial applications. Existing works neglect the \textit{Intra-Class Differences}, inherent in metal surface defect data, which hinders the model from learning sufficient knowledge from the support set to guide the query set segmentation. Specifically, it can be categorized into two types: the \textit{Semantic Difference} induced by internal factors in metal samples and the \textit{Distortion Difference} caused by external factors of surroundings. To address these differences, we introduce a \textbf{L}ocal d\textbf{E}scriptor based \textbf{R}easoning and \textbf{E}xcitation \textbf{Net}work (\textbf{LERENet}) to learn the two-view guidance, i.e., local and global information from the graph and feature space, and fuse them to segment precisely. Since the relation structure of local features embedded in graph space will help to eliminate \textit{Semantic Difference}, we employ Multi-Prototype Reasoning (MPR) module, extracting local descriptors based prototypes and analyzing local-view feature relevance in support-query pairs. Besides, due to the global information that will assist in countering the \textit{Distortion Difference} in observations, we utilize Multi-Prototype Excitation (MPE) module to capture the global-view relations in support-query pairs. Finally, we employ an Information Fusion Module (IFM) to fuse learned prototypes in local and global views to generate pixel-level masks. Our comprehensive experiments on defect datasets demonstrate that it outperforms existing benchmarks, establishing a new state-of-the-art.

LERENet: Eliminating Intra-class Differences for Metal Surface Defect Few-shot Semantic Segmentation

TL;DR

This work addresses the challenge of few-shot semantic segmentation for metal surface defects, where intra-class variations—semantic and distortion differences—undermine cross-sample guidance. It introduces LERENet, a dual-view framework consisting of Multi-Prototype Reasoning (MPR) operating on local descriptors within a graph space, and Multi-Prototype Excitation (MPE) that leverages global-edge information in the feature space, fused by an Information Fusion Module (IFM) to produce precise pixel-level masks. Two intra-class differences are defined and addressed explicitly, with experiments on the Surface Defect-4^i dataset demonstrating state-of-the-art performance in 1-shot and 5-shot scenarios, supported by ablations showing complementary benefits of MPR and MPE. The approach offers a data-efficient solution with strong practical impact for industrial defect inspection where labeling is expensive and defects vary across processes and imaging conditions.

Abstract

Few-shot segmentation models excel in metal defect detection due to their rapid generalization ability to new classes and pixel-level segmentation, rendering them ideal for addressing data scarcity issues and achieving refined object delineation in industrial applications. Existing works neglect the \textit{Intra-Class Differences}, inherent in metal surface defect data, which hinders the model from learning sufficient knowledge from the support set to guide the query set segmentation. Specifically, it can be categorized into two types: the \textit{Semantic Difference} induced by internal factors in metal samples and the \textit{Distortion Difference} caused by external factors of surroundings. To address these differences, we introduce a \textbf{L}ocal d\textbf{E}scriptor based \textbf{R}easoning and \textbf{E}xcitation \textbf{Net}work (\textbf{LERENet}) to learn the two-view guidance, i.e., local and global information from the graph and feature space, and fuse them to segment precisely. Since the relation structure of local features embedded in graph space will help to eliminate \textit{Semantic Difference}, we employ Multi-Prototype Reasoning (MPR) module, extracting local descriptors based prototypes and analyzing local-view feature relevance in support-query pairs. Besides, due to the global information that will assist in countering the \textit{Distortion Difference} in observations, we utilize Multi-Prototype Excitation (MPE) module to capture the global-view relations in support-query pairs. Finally, we employ an Information Fusion Module (IFM) to fuse learned prototypes in local and global views to generate pixel-level masks. Our comprehensive experiments on defect datasets demonstrate that it outperforms existing benchmarks, establishing a new state-of-the-art.
Paper Structure (13 sections, 15 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 15 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Two categories of intra-class differences are observed in metal defect data. We characterize the semantic intra-class difference in $3$ support-query pairs, where defects, vary in fine-grained categories within the same categories. For instance, the defects within the same class, such as Steel, Rail, or Al (aluminum), may appear differently under different manufacturing processes, lighting conditions, or if infected by various noises. We further identify the distortion intra-class difference in $3$ support-query pairs. Here, defects are induced by lens distortions or the perspective from which the image is taken, such as changes in shape, scale, and orientation in the defect instance.
  • Figure 2: Comparison of traditional approaches and our LERENet. Firstly, traditional models extract features from image-level prototypes. In contrast, LERENet employs the multi-prototype at the local descriptor level to represent more implicit local relations. Secondly, our model generates features in local (by Reasoning operation) and global views (by Excitation operation and Global Edge Infomation operation), respectively. After acquiring the local-view graph space features (represented by the yellow square) and the global-view features (represented by blue squares), the two types of differences are addressed separately.
  • Figure 3: LERENet for 5-shot segmentation. (1) represents the process of Multi-Prototype Reasoning. (2) denote the Multi-Prototype Excitation. Given $\boldsymbol{P}_\mathrm{main}$ and $\boldsymbol{P}_\mathrm{aux}$ from above steps, we will get the prediction $\tilde{\boldsymbol{M}}_{q}$ by (3) Information Fusion Module. Finally, we utilize BCE loss to train our model.
  • Figure 4: The overall structure of CBAM. The upper part represents channel attention module, and the lower part represents spatial attention module.
  • Figure 5: Comparison of visual segmentation results. Zoom in for details.
  • ...and 1 more figures