SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Shengyong Chen
TL;DR
This paper addresses pixel-level crack segmentation across diverse materials and conditions by proposing SCSegamba, a lightweight Vision Mamba network built around SAVSS (Structure-Aware Visual State Space) and a Multi-scale Feature Segmentation Head (MFS). SAVSS combines a lightweight Gated Bottleneck Convolution (GBC) with a Structure-Aware Scanning Strategy (SASS) to capture crack morphology and texture while maintaining efficiency, and MFS fuses multi-scale features to produce high-quality maps. The model achieves state-of-the-art performance with only 2.8M parameters and 37MB, e.g., a multi-scenario F1 of 0.8390 and mIoU of 0.8479, while keeping FLOPs competitive at ~18.16G. The work also demonstrates practical viability through ablation studies, a detailed analysis of loss components, and a real-world deployment scenario, underscoring potential for edge deployment and real-time crack monitoring, aided by the formula $L = \alpha L_{Dice} + \beta L_{BCE}$ with $\alpha:\beta = 1:5$.
Abstract
Pixel-level segmentation of structural cracks across various scenarios remains a considerable challenge. Current methods encounter challenges in effectively modeling crack morphology and texture, facing challenges in balancing segmentation quality with low computational resource usage. To overcome these limitations, we propose a lightweight Structure-Aware Vision Mamba Network (SCSegamba), capable of generating high-quality pixel-level segmentation maps by leveraging both the morphological information and texture cues of crack pixels with minimal computational cost. Specifically, we developed a Structure-Aware Visual State Space module (SAVSS), which incorporates a lightweight Gated Bottleneck Convolution (GBC) and a Structure-Aware Scanning Strategy (SASS). The key insight of GBC lies in its effectiveness in modeling the morphological information of cracks, while the SASS enhances the perception of crack topology and texture by strengthening the continuity of semantic information between crack pixels. Experiments on crack benchmark datasets demonstrate that our method outperforms other state-of-the-art (SOTA) methods, achieving the highest performance with only 2.8M parameters. On the multi-scenario dataset, our method reached 0.8390 in F1 score and 0.8479 in mIoU. The code is available at https://github.com/Karl1109/SCSegamba.
