Table of Contents
Fetching ...

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Shengyong Chen

TL;DR

This paper addresses pixel-level crack segmentation across diverse materials and conditions by proposing SCSegamba, a lightweight Vision Mamba network built around SAVSS (Structure-Aware Visual State Space) and a Multi-scale Feature Segmentation Head (MFS). SAVSS combines a lightweight Gated Bottleneck Convolution (GBC) with a Structure-Aware Scanning Strategy (SASS) to capture crack morphology and texture while maintaining efficiency, and MFS fuses multi-scale features to produce high-quality maps. The model achieves state-of-the-art performance with only 2.8M parameters and 37MB, e.g., a multi-scenario F1 of 0.8390 and mIoU of 0.8479, while keeping FLOPs competitive at ~18.16G. The work also demonstrates practical viability through ablation studies, a detailed analysis of loss components, and a real-world deployment scenario, underscoring potential for edge deployment and real-time crack monitoring, aided by the formula $L = \alpha L_{Dice} + \beta L_{BCE}$ with $\alpha:\beta = 1:5$.

Abstract

Pixel-level segmentation of structural cracks across various scenarios remains a considerable challenge. Current methods encounter challenges in effectively modeling crack morphology and texture, facing challenges in balancing segmentation quality with low computational resource usage. To overcome these limitations, we propose a lightweight Structure-Aware Vision Mamba Network (SCSegamba), capable of generating high-quality pixel-level segmentation maps by leveraging both the morphological information and texture cues of crack pixels with minimal computational cost. Specifically, we developed a Structure-Aware Visual State Space module (SAVSS), which incorporates a lightweight Gated Bottleneck Convolution (GBC) and a Structure-Aware Scanning Strategy (SASS). The key insight of GBC lies in its effectiveness in modeling the morphological information of cracks, while the SASS enhances the perception of crack topology and texture by strengthening the continuity of semantic information between crack pixels. Experiments on crack benchmark datasets demonstrate that our method outperforms other state-of-the-art (SOTA) methods, achieving the highest performance with only 2.8M parameters. On the multi-scenario dataset, our method reached 0.8390 in F1 score and 0.8479 in mIoU. The code is available at https://github.com/Karl1109/SCSegamba.

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

TL;DR

This paper addresses pixel-level crack segmentation across diverse materials and conditions by proposing SCSegamba, a lightweight Vision Mamba network built around SAVSS (Structure-Aware Visual State Space) and a Multi-scale Feature Segmentation Head (MFS). SAVSS combines a lightweight Gated Bottleneck Convolution (GBC) with a Structure-Aware Scanning Strategy (SASS) to capture crack morphology and texture while maintaining efficiency, and MFS fuses multi-scale features to produce high-quality maps. The model achieves state-of-the-art performance with only 2.8M parameters and 37MB, e.g., a multi-scenario F1 of 0.8390 and mIoU of 0.8479, while keeping FLOPs competitive at ~18.16G. The work also demonstrates practical viability through ablation studies, a detailed analysis of loss components, and a real-world deployment scenario, underscoring potential for edge deployment and real-time crack monitoring, aided by the formula with .

Abstract

Pixel-level segmentation of structural cracks across various scenarios remains a considerable challenge. Current methods encounter challenges in effectively modeling crack morphology and texture, facing challenges in balancing segmentation quality with low computational resource usage. To overcome these limitations, we propose a lightweight Structure-Aware Vision Mamba Network (SCSegamba), capable of generating high-quality pixel-level segmentation maps by leveraging both the morphological information and texture cues of crack pixels with minimal computational cost. Specifically, we developed a Structure-Aware Visual State Space module (SAVSS), which incorporates a lightweight Gated Bottleneck Convolution (GBC) and a Structure-Aware Scanning Strategy (SASS). The key insight of GBC lies in its effectiveness in modeling the morphological information of cracks, while the SASS enhances the perception of crack topology and texture by strengthening the continuity of semantic information between crack pixels. Experiments on crack benchmark datasets demonstrate that our method outperforms other state-of-the-art (SOTA) methods, achieving the highest performance with only 2.8M parameters. On the multi-scenario dataset, our method reached 0.8390 in F1 score and 0.8479 in mIoU. The code is available at https://github.com/Karl1109/SCSegamba.

Paper Structure

This paper contains 22 sections, 21 equations, 10 figures, 11 tables, 1 algorithm.

Figures (10)

  • Figure 1: Performance of SCSegamba on multi-scenario TUT liu2024staircase dataset. (a) Comparison with SOTA methods. (b) Impact of different SAVSS layer numbers on performance, with normalized metrics; FLOPs (G), Params (M), and Size (MB) decrease towards the edges. (c) Visual results under complex interference.
  • Figure 2: Overview of our proposed method. (a) illustrates the overall architecture of SCSegamba and the processing flow for crack images. (b) displays the structure of the SAVSS block. The input crack image undergoes comprehensive morphological and texture feature extraction through SAVSS, while MFS produces a high-quality pixel-level segmentation map.
  • Figure 3: Architecture of GBC. It employs bottleneck convolution to efficiently reduce the parameters and computational load, while the gating mechanism enhances the model's adaptability in processing diverse crack patterns and complex backgrounds. GN represents group normalization.
  • Figure 4: Illustration of our proposed SASS and other scanning strategies. The first row presents four commonly used single scanning paths, along with our proposed diagonal snake path. The second row illustrates the execution flow of our proposed SASS scanning strategy.
  • Figure 5: Performance comparison between using BottConv and raw convolution in GBC on the TUT liu2024staircase dataset.
  • ...and 5 more figures