Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure
June Moh Goo, Xenios Milidonis, Alessandro Artusi, Jan Boehm, Carlo Ciliberto
TL;DR
This work tackles automated crack segmentation in civil infrastructure by combining dataset refinement with a dual-encoder architecture that fuses local CNN features and global transformer context. The Hybrid-Segmentor, built from a ResNet-50 CNN path and a SegFormer-inspired transformer path, uses overlapping patch embedding, efficient self-attention, and Mix-FFN to capture multi-scale crack details across diverse surfaces, while a lightweight decoder fuses features for final segmentation. A large, refined crack dataset of 12,000 images is introduced by harmonizing 13 public datasets, enabling robust generalization. The model achieves state-of-the-art performance on crack segmentation benchmarks, using BCE-DICE loss with a balanced weight and demonstrating strong qualitative improvements in handling discontinuities and challenging imaging conditions, with acknowledged limitations and clear directions for future work.
Abstract
Detecting and segmenting cracks in infrastructure, such as roads and buildings, is crucial for safety and cost-effective maintenance. In spite of the potential of deep learning, there are challenges in achieving precise results and handling diverse crack types. With the proposed dataset and model, we aim to enhance crack detection and infrastructure maintenance. We introduce Hybrid-Segmentor, an encoder-decoder based approach that is capable of extracting both fine-grained local and global crack features. This allows the model to improve its generalization capabilities in distinguish various type of shapes, surfaces and sizes of cracks. To keep the computational performances low for practical purposes, while maintaining the high the generalization capabilities of the model, we incorporate a self-attention model at the encoder level, while reducing the complexity of the decoder component. The proposed model outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status.
