Table of Contents
Fetching ...

Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure

June Moh Goo, Xenios Milidonis, Alessandro Artusi, Jan Boehm, Carlo Ciliberto

TL;DR

This work tackles automated crack segmentation in civil infrastructure by combining dataset refinement with a dual-encoder architecture that fuses local CNN features and global transformer context. The Hybrid-Segmentor, built from a ResNet-50 CNN path and a SegFormer-inspired transformer path, uses overlapping patch embedding, efficient self-attention, and Mix-FFN to capture multi-scale crack details across diverse surfaces, while a lightweight decoder fuses features for final segmentation. A large, refined crack dataset of 12,000 images is introduced by harmonizing 13 public datasets, enabling robust generalization. The model achieves state-of-the-art performance on crack segmentation benchmarks, using BCE-DICE loss with a balanced weight and demonstrating strong qualitative improvements in handling discontinuities and challenging imaging conditions, with acknowledged limitations and clear directions for future work.

Abstract

Detecting and segmenting cracks in infrastructure, such as roads and buildings, is crucial for safety and cost-effective maintenance. In spite of the potential of deep learning, there are challenges in achieving precise results and handling diverse crack types. With the proposed dataset and model, we aim to enhance crack detection and infrastructure maintenance. We introduce Hybrid-Segmentor, an encoder-decoder based approach that is capable of extracting both fine-grained local and global crack features. This allows the model to improve its generalization capabilities in distinguish various type of shapes, surfaces and sizes of cracks. To keep the computational performances low for practical purposes, while maintaining the high the generalization capabilities of the model, we incorporate a self-attention model at the encoder level, while reducing the complexity of the decoder component. The proposed model outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status.

Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure

TL;DR

This work tackles automated crack segmentation in civil infrastructure by combining dataset refinement with a dual-encoder architecture that fuses local CNN features and global transformer context. The Hybrid-Segmentor, built from a ResNet-50 CNN path and a SegFormer-inspired transformer path, uses overlapping patch embedding, efficient self-attention, and Mix-FFN to capture multi-scale crack details across diverse surfaces, while a lightweight decoder fuses features for final segmentation. A large, refined crack dataset of 12,000 images is introduced by harmonizing 13 public datasets, enabling robust generalization. The model achieves state-of-the-art performance on crack segmentation benchmarks, using BCE-DICE loss with a balanced weight and demonstrating strong qualitative improvements in handling discontinuities and challenging imaging conditions, with acknowledged limitations and clear directions for future work.

Abstract

Detecting and segmenting cracks in infrastructure, such as roads and buildings, is crucial for safety and cost-effective maintenance. In spite of the potential of deep learning, there are challenges in achieving precise results and handling diverse crack types. With the proposed dataset and model, we aim to enhance crack detection and infrastructure maintenance. We introduce Hybrid-Segmentor, an encoder-decoder based approach that is capable of extracting both fine-grained local and global crack features. This allows the model to improve its generalization capabilities in distinguish various type of shapes, surfaces and sizes of cracks. To keep the computational performances low for practical purposes, while maintaining the high the generalization capabilities of the model, we incorporate a self-attention model at the encoder level, while reducing the complexity of the decoder component. The proposed model outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status.
Paper Structure (25 sections, 7 equations, 6 figures, 5 tables)

This paper contains 25 sections, 7 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Representative failures in crack detection by traditional models: (a) shows a prediction by a Fully Convolutional Network (FCN) on a blurred image, incorrectly marked as a crack, highlighting difficulties with image clarity. (b), from a U-Net architecture, displays a brick pattern where the borders of the bricks are wrongly identified as cracks, revealing challenges in differentiating structural boundaries
  • Figure 2: The figure shows the improvement in small holes, discontinuity and thinness of the ground truth after applying appropriate image processing methods.
  • Figure 3: The Hybrid-Segmentor architecture: the upper path for CNN and the lower for Transformers. Each path generates feature maps at every layer, and the central blue boxes represent the concatenation of these feature maps.
  • Figure 4: Hybrid model performance compared against CNN and transformer paths. The CNN path captures detailed contours (red circles), while the Transformer path gives an overall structure but with thicker predictions (blue circles).
  • Figure 5: Example crack images segmented by our model and benchmarked models. The red ovals highlight the areas where our model outperforms other benchmarked models. In examples without red ovals, such as (F) and (H), our model demonstrates strong performance across overall structures.
  • ...and 1 more figures