Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing

Jongmin Yu; Chen Bene Chi; Sebastiano Fichera; Paolo Paoletti; Devansh Mehta; Shan Luo

Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing

Jongmin Yu, Chen Bene Chi, Sebastiano Fichera, Paolo Paoletti, Devansh Mehta, Shan Luo

TL;DR

The paper addresses the challenge of simultaneously detecting and segmenting multiple road defect classes in a unified end-to-end framework. It introduces SCM-MRCNN, a Mask-RCNN-based architecture augmented with Spatial and Channel-wise Multi-head Attention (SCM-attention) blocks to learn robust spatio-channel representations, enabling improved multi-class defect detection and segmentation. A new RoadEYE dataset with nine defect classes provides a benchmark for both bounding-box detection and pixel-level segmentation, and extensive experiments show state-of-the-art performance on RDD2020, CS datasets, and RoadEYE, with metrics such as $mAP$, $AP_M$, $AP_B$, and $AIU$ demonstrating gains. The work demonstrates that long-range dependencies in both spatial and channel dimensions enhance defect understanding, offering practical impact for autonomous road repair systems that require precise localization and segmentation to optimize repair material usage.

Abstract

Road pavement detection and segmentation are critical for developing autonomous road repair systems. However, developing an instance segmentation method that simultaneously performs multi-class defect detection and segmentation is challenging due to the textural simplicity of road pavement image, the diversity of defect geometries, and the morphological ambiguity between classes. We propose a novel end-to-end method for multi-class road defect detection and segmentation. The proposed method comprises multiple spatial and channel-wise attention blocks available to learn global representations across spatial and channel-wise dimensions. Through these attention blocks, more globally generalised representations of morphological information (spatial characteristics) of road defects and colour and depth information of images can be learned. To demonstrate the effectiveness of our framework, we conducted various ablation studies and comparisons with prior methods on a newly collected dataset annotated with nine road defect classes. The experiments show that our proposed method outperforms existing state-of-the-art methods for multi-class road defect detection and segmentation methods.

Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing

TL;DR

, and

demonstrating gains. The work demonstrates that long-range dependencies in both spatial and channel dimensions enhance defect understanding, offering practical impact for autonomous road repair systems that require precise localization and segmentation to optimize repair material usage.

Abstract

Paper Structure (11 sections, 7 equations, 3 figures, 5 tables)

This paper contains 11 sections, 7 equations, 3 figures, 5 tables.

Introduction
Related works
The proposed method
Method Overview
Training and Instance Segmentation
Experimental results
Dataset and evaluation metrics
Implementation
Ablation study
Comparison with existing methods
Conclusion

Figures (3)

Figure 1: Visualisation of Kernel similarities computed based on Centered Kernel Alignment (CKA) kornblith2019similarity. (a) denotes the kernel similarity map for segmentation (Seg-based model) and detection (Det-based model) models. (b) denotes the kernel similarity map for segmentation and instance segmentation (InstanceSeg-based model) models. (c) denotes the kernel similarity map for detection and instance segmentation models. Each axis denotes the depth of layers. The brighter the colour, the higher the kernel similarity.
Figure 2: Structural details of the proposed SCM-MRCNN and SCM attention block. (a) denotes Architectural details of the proposed SCM. C# and D# denote convolutional and deconvolutional layers, respectively. 'Concat' represents a concatenating operation between two latent features. (b) and (c) represents the structural details of the proposed spatial and channel-wise multi-head attention (SCM-attention) block and patch-based channel attention, respectively. $\otimes$ and $\oplus$ denote element-wise multiplication and element-wise addition, respectively.
Figure 3: Example snapshots of a sample of the RoadEYE dataset. Top: images. Bottom: Annotation of instance segmentation for multi-class road defect detection and segmentation.

Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing

TL;DR

Abstract

Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing

Authors

TL;DR

Abstract

Table of Contents

Figures (3)