Table of Contents
Fetching ...

FlexiCrackNet: A Flexible Pipeline for Enhanced Crack Segmentation with General Features Transfered from SAM

Xinlong Wan, Xiaoyan Jiang, Guangsheng Luo, Ferdous Sohel, Jenqneng Hwang

TL;DR

FlexiCrackNet tackles crack segmentation under diverse real-world conditions by fusing general visual priors from a lightweight, frozen EdgeSAM encoder with task-specific crack features through the information-interaction gated attention mechanism. The three-module pipeline enables flexible input sizes and efficient deployment on resource-constrained devices while achieving strong generalization, including zero-shot performance across cross-domain datasets. Key contributions include the IGAM feature fusion strategy, a scalable scaling module, and empirical evidence of superior accuracy and efficiency over state-of-the-art methods on DeepCrack, CFD, and Crack500. This approach promises practical impact for automated crack detection and structural health monitoring in real-world infrastructure systems.

Abstract

Automatic crack segmentation is a cornerstone technology for intelligent visual perception modules in road safety maintenance and structural integrity systems. Existing deep learning models and ``pre-training + fine-tuning'' paradigms often face challenges of limited adaptability in resource-constrained environments and inadequate scalability across diverse data domains. To overcome these limitations, we propose FlexiCrackNet, a novel pipeline that seamlessly integrates traditional deep learning paradigms with the strengths of large-scale pre-trained models. At its core, FlexiCrackNet employs an encoder-decoder architecture to extract task-specific features. The lightweight EdgeSAM's CNN-based encoder is exclusively used as a generic feature extractor, decoupled from the fixed input size requirements of EdgeSAM. To harmonize general and domain-specific features, we introduce the information-Interaction gated attention mechanism (IGAM), which adaptively fuses multi-level features to enhance segmentation performance while mitigating irrelevant noise. This design enables the efficient transfer of general knowledge to crack segmentation tasks while ensuring adaptability to diverse input resolutions and resource-constrained environments. Experiments show that FlexiCrackNet outperforms state-of-the-art methods, excels in zero-shot generalization, computational efficiency, and segmentation robustness under challenging scenarios such as blurry inputs, complex backgrounds, and visually ambiguous artifacts. These advancements underscore the potential of FlexiCrackNet for real-world applications in automated crack detection and comprehensive structural health monitoring systems.

FlexiCrackNet: A Flexible Pipeline for Enhanced Crack Segmentation with General Features Transfered from SAM

TL;DR

FlexiCrackNet tackles crack segmentation under diverse real-world conditions by fusing general visual priors from a lightweight, frozen EdgeSAM encoder with task-specific crack features through the information-interaction gated attention mechanism. The three-module pipeline enables flexible input sizes and efficient deployment on resource-constrained devices while achieving strong generalization, including zero-shot performance across cross-domain datasets. Key contributions include the IGAM feature fusion strategy, a scalable scaling module, and empirical evidence of superior accuracy and efficiency over state-of-the-art methods on DeepCrack, CFD, and Crack500. This approach promises practical impact for automated crack detection and structural health monitoring in real-world infrastructure systems.

Abstract

Automatic crack segmentation is a cornerstone technology for intelligent visual perception modules in road safety maintenance and structural integrity systems. Existing deep learning models and ``pre-training + fine-tuning'' paradigms often face challenges of limited adaptability in resource-constrained environments and inadequate scalability across diverse data domains. To overcome these limitations, we propose FlexiCrackNet, a novel pipeline that seamlessly integrates traditional deep learning paradigms with the strengths of large-scale pre-trained models. At its core, FlexiCrackNet employs an encoder-decoder architecture to extract task-specific features. The lightweight EdgeSAM's CNN-based encoder is exclusively used as a generic feature extractor, decoupled from the fixed input size requirements of EdgeSAM. To harmonize general and domain-specific features, we introduce the information-Interaction gated attention mechanism (IGAM), which adaptively fuses multi-level features to enhance segmentation performance while mitigating irrelevant noise. This design enables the efficient transfer of general knowledge to crack segmentation tasks while ensuring adaptability to diverse input resolutions and resource-constrained environments. Experiments show that FlexiCrackNet outperforms state-of-the-art methods, excels in zero-shot generalization, computational efficiency, and segmentation robustness under challenging scenarios such as blurry inputs, complex backgrounds, and visually ambiguous artifacts. These advancements underscore the potential of FlexiCrackNet for real-world applications in automated crack detection and comprehensive structural health monitoring systems.

Paper Structure

This paper contains 16 sections, 7 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Crack segmentation training paradigms. (a) Traditional supervised learning allows flexibility in image sizes and model architectures but suffers from limited generalization capability. (b) The "pre-training + fine-tuning" paradigm, as exemplified by SAM, offers improved generalization but is constrained by fixed image resolutions and limited architectural adaptability. (c) The proposed FlexiCrackNet supports customizable image sizes and model architectures while significantly enhancing generalization ability.
  • Figure 2: The first nine feature maps of the five stages in the EdgeSAM encoder.
  • Figure 3: The overall pipeline of FlexiCrackNet. The pipeline adopts an encoder-decoder structure for crack segmentation, where IGAM modules are integrated into the encoder to fuse general semantic and crack-specific features. IIM within IGAM generates crack-specific attention masks, enabling effective feature fusion.
  • Figure 4: The structure of IIM.
  • Figure 5: Visualization of samples from the DeepCrack test set.
  • ...and 3 more figures