Table of Contents
Fetching ...

Staircase Cascaded Fusion of Lightweight Local Pattern Recognition and Long-Range Dependencies for Structural Crack Segmentation

Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Mianzhao Wang, Shengyong Chen, Yang Lv

TL;DR

The experimental results show that the CrackSCF method consistently outperforms the existing methods, and it demonstrates greater robustness in dealing with complex background noise.

Abstract

Accurately segmenting structural cracks at the pixel level remains a major hurdle, as existing methods fail to integrate local textures with pixel dependencies, often leading to fragmented and incomplete predictions. Moreover, their high parameter counts and substantial computational demands hinder practical deployment on resource-constrained edge devices. To address these challenges, we propose CrackSCF, a Lightweight Cascaded Fusion Crack Segmentation Network designed to achieve robust crack segmentation with exceptional computational efficiency. We design a lightweight convolutional block (LRDS) to replace all standard convolutions. This approach efficiently captures local patterns while operating with a minimal computational footprint. For a holistic perception of crack structures, a lightweight Long-range Dependency Extractor (LDE) captures global dependencies. These are then intelligently unified with local patterns by our Staircase Cascaded Fusion Module (SCFM), ensuring the final segmentation maps are both seamless in continuity and rich in fine-grained detail. To comprehensively evaluate our method, this paper created the challenging TUT benchmark dataset and evaluated it alongside five other public datasets. The experimental results show that the CrackSCF method consistently outperforms the existing methods, and it demonstrates greater robustness in dealing with complex background noise. On the TUT dataset, CrackSCF achieved 0.8382 on F1 score and 0.8473 on mIoU, and it only required 4.79M parameters.

Staircase Cascaded Fusion of Lightweight Local Pattern Recognition and Long-Range Dependencies for Structural Crack Segmentation

TL;DR

The experimental results show that the CrackSCF method consistently outperforms the existing methods, and it demonstrates greater robustness in dealing with complex background noise.

Abstract

Accurately segmenting structural cracks at the pixel level remains a major hurdle, as existing methods fail to integrate local textures with pixel dependencies, often leading to fragmented and incomplete predictions. Moreover, their high parameter counts and substantial computational demands hinder practical deployment on resource-constrained edge devices. To address these challenges, we propose CrackSCF, a Lightweight Cascaded Fusion Crack Segmentation Network designed to achieve robust crack segmentation with exceptional computational efficiency. We design a lightweight convolutional block (LRDS) to replace all standard convolutions. This approach efficiently captures local patterns while operating with a minimal computational footprint. For a holistic perception of crack structures, a lightweight Long-range Dependency Extractor (LDE) captures global dependencies. These are then intelligently unified with local patterns by our Staircase Cascaded Fusion Module (SCFM), ensuring the final segmentation maps are both seamless in continuity and rich in fine-grained detail. To comprehensively evaluate our method, this paper created the challenging TUT benchmark dataset and evaluated it alongside five other public datasets. The experimental results show that the CrackSCF method consistently outperforms the existing methods, and it demonstrates greater robustness in dealing with complex background noise. On the TUT dataset, CrackSCF achieved 0.8382 on F1 score and 0.8473 on mIoU, and it only required 4.79M parameters.
Paper Structure (15 sections, 18 equations, 10 figures, 17 tables)

This paper contains 15 sections, 18 equations, 10 figures, 17 tables.

Figures (10)

  • Figure 1: Performance comparison of our proposed method with UCTNet wang2022uctransnet, SFIAN cheng2023selective, CT-crackseg tao2023convolutional, DTrCNet xiang2023crack, Crackmer wang2024dual and Simcrack jaziri2024designing on the TUT dataset in terms of mIoUs, FLOPs and Params metrics.
  • Figure 2: General Architecture Diagram of CrackSCF Network. Crack images are first input into the MFE, generating four feature maps. After enhancement, five feature maps with unified channel numbers are obtained. The LDE processes these maps to acquire pixel sequences rich in long-range correlations. The SCFM then processes these sequences and the four layers of feature maps from the MFE. Through four stages, resulting in the segmentation output.
  • Figure 3: Comparison of FLOPs and Params for each module using the original convolution and our proposed LRDS convolution block.
  • Figure 4: Illustration of LRDS convolution block. After the number of channels of the input features is reduced, their local and spatial features are efficiently extracted by depth-wise convolution and point-wise convolution, and finally the number of channels is restored.
  • Figure 5: Analysis of Computational Resources Required for Linear Operations. (a) shows the percentage of FLOPs and Params in the LDE module that use Original linear operations, with the inner circle showing the percentage that use FLOPs and the outer circle showing the percentage of Params. (b) Shows the comparison of FLOPs and Params for LDE before and after using LR Linear.
  • ...and 5 more figures