Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects
Xiaoheng Jiang, Feng Yan, Yang Lu, Ke Wang, Shuai Guo, Tianzhu Zhang, Yanwei Pang, Jianwei Niu, Mingliang Xu
TL;DR
This work tackles surface defect saliency detection under challenging real-world conditions like scale variation and low contrast. It introduces JAFFNet, which combines a joint attention-guided feature fusion (JAFF) module and a dense receptive field (DRF) module within an encoder–decoder framework to adaptively fuse multi-level features and expand contextual receptive fields. The dual attention mechanism in JAFF refines low-level defect cues using high-level semantic guidance, while DRF aggregates local and global context through densely connected multi-receptive-field units. Across SD-saliency-900, Magnetic tile, and DAGM 2007 datasets, JAFFNet achieves state-of-the-art performance with a real-time inference speed of 66 FPS, demonstrating strong practical potential for industrial defect inspection.
Abstract
Surface defect inspection plays an important role in the process of industrial manufacture and production. Though Convolutional Neural Network (CNN) based defect inspection methods have made huge leaps, they still confront a lot of challenges such as defect scale variation, complex background, low contrast, and so on. To address these issues, we propose a joint attention-guided feature fusion network (JAFFNet) for saliency detection of surface defects based on the encoder-decoder network. JAFFNet mainly incorporates a joint attention-guided feature fusion (JAFF) module into decoding stages to adaptively fuse low-level and high-level features. The JAFF module learns to emphasize defect features and suppress background noise during feature fusion, which is beneficial for detecting low-contrast defects. In addition, JAFFNet introduces a dense receptive field (DRF) module following the encoder to capture features with rich context information, which helps detect defects of different scales. The JAFF module mainly utilizes a learned joint channel-spatial attention map provided by high-level semantic features to guide feature fusion. The attention map makes the model pay more attention to defect features. The DRF module utilizes a sequence of multi-receptive-field (MRF) units with each taking as inputs all the preceding MRF feature maps and the original input. The obtained DRF features capture rich context information with a large range of receptive fields. Extensive experiments conducted on SD-saliency-900, Magnetic tile, and DAGM 2007 indicate that our method achieves promising performance in comparison with other state-of-the-art methods. Meanwhile, our method reaches a real-time defect detection speed of 66 FPS.
