Table of Contents
Fetching ...

Fab-ME: A Vision State-Space and Attention-Enhanced Framework for Fabric Defect Detection

Shuai Wang, Huiyan Kong, Baotian Li, Fa Zheng

TL;DR

Fab-ME addresses fabric defect detection by integrating Vision State-Space blocks and enhanced attention into a YOLOv8s backbone, improving both global context and multi-scale feature discrimination. The C2F-VMamba module in the neck and the EMCA module in the backbone collectively elevate detection accuracy for 20 fabric defect types, achieving 59.4% mAP@0.5 on the Tianchi dataset, about 3.5 percentage points higher than the baseline YOLOv8s, while preserving real-time performance. Ablation studies confirm the individual and combined contributions of C2F-VMamba and EMCA to accuracy and efficiency. Overall, Fab-ME offers a robust, fast solution for industrial fabric inspection with strong potential for deployment in intelligent manufacturing environments.

Abstract

Effective defect detection is critical for ensuring the quality, functionality, and economic value of textile products. However, existing methods face challenges in achieving high accuracy, real-time performance, and efficient global information extraction. To address these issues, we propose Fab-ME, an advanced framework based on YOLOv8s, specifically designed for the accurate detection of 20 fabric defect types. Our contributions include the introduction of the cross-stage partial bottleneck with two convolutions (C2F) vision state-space (C2F-VMamba) module, which integrates visual state-space (VSS) blocks into the YOLOv8s feature fusion network neck, enhancing the capture of intricate details and global context while maintaining high processing speeds. Additionally, we incorporate an enhanced multi-scale channel attention (EMCA) module into the final layer of the feature extraction network, significantly improving sensitivity to small targets. Experimental results on the Tianchi fabric defect detection dataset demonstrate that Fab-ME achieves a 3.5% improvement in mAP@0.5 compared to the original YOLOv8s, validating its effectiveness for precise and efficient fabric defect detection.

Fab-ME: A Vision State-Space and Attention-Enhanced Framework for Fabric Defect Detection

TL;DR

Fab-ME addresses fabric defect detection by integrating Vision State-Space blocks and enhanced attention into a YOLOv8s backbone, improving both global context and multi-scale feature discrimination. The C2F-VMamba module in the neck and the EMCA module in the backbone collectively elevate detection accuracy for 20 fabric defect types, achieving 59.4% mAP@0.5 on the Tianchi dataset, about 3.5 percentage points higher than the baseline YOLOv8s, while preserving real-time performance. Ablation studies confirm the individual and combined contributions of C2F-VMamba and EMCA to accuracy and efficiency. Overall, Fab-ME offers a robust, fast solution for industrial fabric inspection with strong potential for deployment in intelligent manufacturing environments.

Abstract

Effective defect detection is critical for ensuring the quality, functionality, and economic value of textile products. However, existing methods face challenges in achieving high accuracy, real-time performance, and efficient global information extraction. To address these issues, we propose Fab-ME, an advanced framework based on YOLOv8s, specifically designed for the accurate detection of 20 fabric defect types. Our contributions include the introduction of the cross-stage partial bottleneck with two convolutions (C2F) vision state-space (C2F-VMamba) module, which integrates visual state-space (VSS) blocks into the YOLOv8s feature fusion network neck, enhancing the capture of intricate details and global context while maintaining high processing speeds. Additionally, we incorporate an enhanced multi-scale channel attention (EMCA) module into the final layer of the feature extraction network, significantly improving sensitivity to small targets. Experimental results on the Tianchi fabric defect detection dataset demonstrate that Fab-ME achieves a 3.5% improvement in mAP@0.5 compared to the original YOLOv8s, validating its effectiveness for precise and efficient fabric defect detection.

Paper Structure

This paper contains 17 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The proposed Fab-ME framework. The feature maps generated at each stage of the backbone network are denoted as $c_1$ through $c_4$. C2F stands for "CSP Bottleneck with Two Convolutions. (a) C2F-VMamba. (b) Visual State-Space (VSS). (c) Enhanced multi-scale channel attention (EMCA).
  • Figure 2: Sample display of original fabric defect images from the Tianchi fabric dataset
  • Figure 3: Ablation study of key components in our method. (a) Ablation experiments of substituting the C2F modules at different positions within the Neck with the C2F-VMamba module. In the neck module illustrated in Fig. \ref{['fig:framework']}, the C2F modules are sequentially labeled from top to bottom as C2F1 through C2F4. Replacing each C2F module with the C2F-VMamba module results in models (2) through (5). (b) Ablation experiments of EMCA. (c) Ablation study of key components in our method.
  • Figure 4: Ablation studies of our proposed module were conducted on YOLOv5s, YOLOv6s, and YOLOv8s.
  • Figure 5: Visualization result. Compared our method with Faster R-CNN, YOLOv5s, and baseline. The numbers near the boxes in the figure represent the category numbers of the defects, and the category names and colors can be found in the Name and Color column of Table \ref{['table:samples']}.