Table of Contents
Fetching ...

Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects

Xiaoheng Jiang, Feng Yan, Yang Lu, Ke Wang, Shuai Guo, Tianzhu Zhang, Yanwei Pang, Jianwei Niu, Mingliang Xu

TL;DR

This work tackles surface defect saliency detection under challenging real-world conditions like scale variation and low contrast. It introduces JAFFNet, which combines a joint attention-guided feature fusion (JAFF) module and a dense receptive field (DRF) module within an encoder–decoder framework to adaptively fuse multi-level features and expand contextual receptive fields. The dual attention mechanism in JAFF refines low-level defect cues using high-level semantic guidance, while DRF aggregates local and global context through densely connected multi-receptive-field units. Across SD-saliency-900, Magnetic tile, and DAGM 2007 datasets, JAFFNet achieves state-of-the-art performance with a real-time inference speed of 66 FPS, demonstrating strong practical potential for industrial defect inspection.

Abstract

Surface defect inspection plays an important role in the process of industrial manufacture and production. Though Convolutional Neural Network (CNN) based defect inspection methods have made huge leaps, they still confront a lot of challenges such as defect scale variation, complex background, low contrast, and so on. To address these issues, we propose a joint attention-guided feature fusion network (JAFFNet) for saliency detection of surface defects based on the encoder-decoder network. JAFFNet mainly incorporates a joint attention-guided feature fusion (JAFF) module into decoding stages to adaptively fuse low-level and high-level features. The JAFF module learns to emphasize defect features and suppress background noise during feature fusion, which is beneficial for detecting low-contrast defects. In addition, JAFFNet introduces a dense receptive field (DRF) module following the encoder to capture features with rich context information, which helps detect defects of different scales. The JAFF module mainly utilizes a learned joint channel-spatial attention map provided by high-level semantic features to guide feature fusion. The attention map makes the model pay more attention to defect features. The DRF module utilizes a sequence of multi-receptive-field (MRF) units with each taking as inputs all the preceding MRF feature maps and the original input. The obtained DRF features capture rich context information with a large range of receptive fields. Extensive experiments conducted on SD-saliency-900, Magnetic tile, and DAGM 2007 indicate that our method achieves promising performance in comparison with other state-of-the-art methods. Meanwhile, our method reaches a real-time defect detection speed of 66 FPS.

Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects

TL;DR

This work tackles surface defect saliency detection under challenging real-world conditions like scale variation and low contrast. It introduces JAFFNet, which combines a joint attention-guided feature fusion (JAFF) module and a dense receptive field (DRF) module within an encoder–decoder framework to adaptively fuse multi-level features and expand contextual receptive fields. The dual attention mechanism in JAFF refines low-level defect cues using high-level semantic guidance, while DRF aggregates local and global context through densely connected multi-receptive-field units. Across SD-saliency-900, Magnetic tile, and DAGM 2007 datasets, JAFFNet achieves state-of-the-art performance with a real-time inference speed of 66 FPS, demonstrating strong practical potential for industrial defect inspection.

Abstract

Surface defect inspection plays an important role in the process of industrial manufacture and production. Though Convolutional Neural Network (CNN) based defect inspection methods have made huge leaps, they still confront a lot of challenges such as defect scale variation, complex background, low contrast, and so on. To address these issues, we propose a joint attention-guided feature fusion network (JAFFNet) for saliency detection of surface defects based on the encoder-decoder network. JAFFNet mainly incorporates a joint attention-guided feature fusion (JAFF) module into decoding stages to adaptively fuse low-level and high-level features. The JAFF module learns to emphasize defect features and suppress background noise during feature fusion, which is beneficial for detecting low-contrast defects. In addition, JAFFNet introduces a dense receptive field (DRF) module following the encoder to capture features with rich context information, which helps detect defects of different scales. The JAFF module mainly utilizes a learned joint channel-spatial attention map provided by high-level semantic features to guide feature fusion. The attention map makes the model pay more attention to defect features. The DRF module utilizes a sequence of multi-receptive-field (MRF) units with each taking as inputs all the preceding MRF feature maps and the original input. The obtained DRF features capture rich context information with a large range of receptive fields. Extensive experiments conducted on SD-saliency-900, Magnetic tile, and DAGM 2007 indicate that our method achieves promising performance in comparison with other state-of-the-art methods. Meanwhile, our method reaches a real-time defect detection speed of 66 FPS.
Paper Structure (32 sections, 16 equations, 8 figures, 4 tables)

This paper contains 32 sections, 16 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Challenges of surface defect inspection. (a) and (b) defects with different scales. (c) defects with low contrast. (a) and (d) interference factors in the background. The defects and interference factors are represented by red and yellow rectangles, respectively.
  • Figure 2: Architecture of the proposed network. Our model consists of an encoder and a decoder, where we obtain multi-level features with channels 64, 128, 256, 512, and 512 from five encoding stages $E_1 \sim E_5$, respectively. And $D_1 \sim D_4$ represent four decoding stages with each including a joint attention-guided feature fusion (JAFF) module and a convolution block. And the JAFF focuses on the fusion of high-level and low-level features. It incorporates a dual attention module consisting of a channel attention branch (CAB) and a spatial attention branch (SAB) to generate the learned channel-spatial attention map that provides guidance for feature fusion. The dense receptive field (DRF) module after the encoder is used to capture dense context information. And the "DS" and "r" denote depthwise separable convolution and rate of dilated convolution, respectively.
  • Figure 3: Illustration of the Dense Receptive Field (DRF) module. The DRF module densely connects a chain of MRF units with each consisting of three parallel 3 $\times$ 3 dilated convolutions with rates = (1, 2, 4).
  • Figure 4: Qualitative comparisons for saliency maps of different methods on three defect datasets: SD-saliency-900 (the 1st$\sim$3rd rows), Magnetic tile (the 4th$\sim$6th rows), DAGM 2007 (the 7th$\sim$9th rows). (a)$\sim$(j) represent LWNet, C2FNet, AttaNet, PFSNet, EDN, PGNet, CSFNet, EDRNet, DACNet and Ours, respectively.
  • Figure 5: The PR curves and F-measure curves of different methods on three defect datasets. (a), (b) and (c) represent SD-saliency-900 ($\rho$=20%), Magnetic tile and DAGM 2007, respectively.
  • ...and 3 more figures