Table of Contents
Fetching ...

Self-Supervised Multi-Scale Transformer with Attention-Guided Fusion for Efficient Crack Detection

Blessing Agyei Kyem, Joshua Kofi Asamoah, Eugene Denteh, Andrews Danyo, Armstrong Aboah

TL;DR

This work tackles the high cost of pixel-level annotations in pavement crack segmentation by proposing Crack-Segmenter, a fully self-supervised framework that eliminates the need for ground-truth masks. It fuses three novel components—Scale-Adaptive Embedder, Directional Attention Transformer, and Attention-Guided Fusion—together with cross-scale consistency losses to learn robust crack representations from unlabeled images. Across ten public crack datasets, Crack-Segmenter outperforms 13 state-of-the-art fully supervised methods on metrics such as $mIoU$, $Dice$, $XOR$, and $HD$, with ablation and statistical analyses confirming the value of each module. The approach enables scalable, annotation-free infrastructure monitoring and provides a strong foundation for future explorations in self-supervised crack detection and multi-scale transformer architectures.

Abstract

Pavement crack detection has long depended on costly and time-intensive pixel-level annotations, which limit its scalability for large-scale infrastructure monitoring. To overcome this barrier, this paper examines the feasibility of achieving effective pixel-level crack segmentation entirely without manual annotations. Building on this objective, a fully self-supervised framework, Crack-Segmenter, is developed, integrating three complementary modules: the Scale-Adaptive Embedder (SAE) for robust multi-scale feature extraction, the Directional Attention Transformer (DAT) for maintaining linear crack continuity, and the Attention-Guided Fusion (AGF) module for adaptive feature integration. Through evaluations on ten public datasets, Crack-Segmenter consistently outperforms 13 state-of-the-art supervised methods across all major metrics, including mean Intersection over Union (mIoU), Dice score, XOR, and Hausdorff Distance (HD). These findings demonstrate that annotation-free crack detection is not only feasible but also superior, enabling transportation agencies and infrastructure managers to conduct scalable and cost-effective monitoring. This work advances self-supervised learning and motivates pavement cracks detection research.

Self-Supervised Multi-Scale Transformer with Attention-Guided Fusion for Efficient Crack Detection

TL;DR

This work tackles the high cost of pixel-level annotations in pavement crack segmentation by proposing Crack-Segmenter, a fully self-supervised framework that eliminates the need for ground-truth masks. It fuses three novel components—Scale-Adaptive Embedder, Directional Attention Transformer, and Attention-Guided Fusion—together with cross-scale consistency losses to learn robust crack representations from unlabeled images. Across ten public crack datasets, Crack-Segmenter outperforms 13 state-of-the-art fully supervised methods on metrics such as , , , and , with ablation and statistical analyses confirming the value of each module. The approach enables scalable, annotation-free infrastructure monitoring and provides a strong foundation for future explorations in self-supervised crack detection and multi-scale transformer architectures.

Abstract

Pavement crack detection has long depended on costly and time-intensive pixel-level annotations, which limit its scalability for large-scale infrastructure monitoring. To overcome this barrier, this paper examines the feasibility of achieving effective pixel-level crack segmentation entirely without manual annotations. Building on this objective, a fully self-supervised framework, Crack-Segmenter, is developed, integrating three complementary modules: the Scale-Adaptive Embedder (SAE) for robust multi-scale feature extraction, the Directional Attention Transformer (DAT) for maintaining linear crack continuity, and the Attention-Guided Fusion (AGF) module for adaptive feature integration. Through evaluations on ten public datasets, Crack-Segmenter consistently outperforms 13 state-of-the-art supervised methods across all major metrics, including mean Intersection over Union (mIoU), Dice score, XOR, and Hausdorff Distance (HD). These findings demonstrate that annotation-free crack detection is not only feasible but also superior, enabling transportation agencies and infrastructure managers to conduct scalable and cost-effective monitoring. This work advances self-supervised learning and motivates pavement cracks detection research.

Paper Structure

This paper contains 38 sections, 42 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Overall framework for the proposed architecture
  • Figure 2: Scale Adaptive Embedder Module.
  • Figure 3: Directional Attention Transformer Module.
  • Figure 4: Attention-Guided Fusion Module.
  • Figure 5: Radar plots for validation Dice and mIoU scores of all the baseline models and Crack-Segmenter across all the 10 datasets.
  • ...and 6 more figures