Self-Supervised Multi-Scale Transformer with Attention-Guided Fusion for Efficient Crack Detection

Blessing Agyei Kyem; Joshua Kofi Asamoah; Eugene Denteh; Andrews Danyo; Armstrong Aboah

Self-Supervised Multi-Scale Transformer with Attention-Guided Fusion for Efficient Crack Detection

Blessing Agyei Kyem, Joshua Kofi Asamoah, Eugene Denteh, Andrews Danyo, Armstrong Aboah

TL;DR

This work tackles the high cost of pixel-level annotations in pavement crack segmentation by proposing Crack-Segmenter, a fully self-supervised framework that eliminates the need for ground-truth masks. It fuses three novel components—Scale-Adaptive Embedder, Directional Attention Transformer, and Attention-Guided Fusion—together with cross-scale consistency losses to learn robust crack representations from unlabeled images. Across ten public crack datasets, Crack-Segmenter outperforms 13 state-of-the-art fully supervised methods on metrics such as $mIoU$, $Dice$, $XOR$, and $HD$, with ablation and statistical analyses confirming the value of each module. The approach enables scalable, annotation-free infrastructure monitoring and provides a strong foundation for future explorations in self-supervised crack detection and multi-scale transformer architectures.

Abstract

Pavement crack detection has long depended on costly and time-intensive pixel-level annotations, which limit its scalability for large-scale infrastructure monitoring. To overcome this barrier, this paper examines the feasibility of achieving effective pixel-level crack segmentation entirely without manual annotations. Building on this objective, a fully self-supervised framework, Crack-Segmenter, is developed, integrating three complementary modules: the Scale-Adaptive Embedder (SAE) for robust multi-scale feature extraction, the Directional Attention Transformer (DAT) for maintaining linear crack continuity, and the Attention-Guided Fusion (AGF) module for adaptive feature integration. Through evaluations on ten public datasets, Crack-Segmenter consistently outperforms 13 state-of-the-art supervised methods across all major metrics, including mean Intersection over Union (mIoU), Dice score, XOR, and Hausdorff Distance (HD). These findings demonstrate that annotation-free crack detection is not only feasible but also superior, enabling transportation agencies and infrastructure managers to conduct scalable and cost-effective monitoring. This work advances self-supervised learning and motivates pavement cracks detection research.

Self-Supervised Multi-Scale Transformer with Attention-Guided Fusion for Efficient Crack Detection

TL;DR

Abstract

Self-Supervised Multi-Scale Transformer with Attention-Guided Fusion for Efficient Crack Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)