Table of Contents
Fetching ...

RDD4D: 4D Attention-Guided Road Damage Detection And Classification

Asma Alkalbani, Muhammad Saqib, Ahmed Salim Alrawahi, Abbas Anwar, Chandarnath Adak, Saeed Anwar

TL;DR

This work tackles the lack of diverse, multi-type road-damage benchmarks by introducing the Diverse Road Damage Dataset (DRDD) and a 4D attention-enhanced detector, RDD4D. RDD4D extends a RTMDet-based backbone with Attention4D blocks to refine features across scales, enabling improved detection of large and dense road damages. The approach achieves state-of-the-art performance on DRDD (AP up to 0.458 for large damages; overall AP 0.445) and strong results on CrackTinyNet (MAP@.5 ≈ 0.825) with high recall. The combination of a challenging dataset, dynamic soft-label assignment in training, and multi-scale attention yields notable practical impact for automated road maintenance and scalable infrastructure monitoring.

Abstract

Road damage detection and assessment are crucial components of infrastructure maintenance. However, current methods often struggle with detecting multiple types of road damage in a single image, particularly at varying scales. This is due to the lack of road datasets with various damage types having varying scales. To overcome this deficiency, first, we present a novel dataset called Diverse Road Damage Dataset (DRDD) for road damage detection that captures the diverse road damage types in individual images, addressing a crucial gap in existing datasets. Then, we provide our model, RDD4D, that exploits Attention4D blocks, enabling better feature refinement across multiple scales. The Attention4D module processes feature maps through an attention mechanism combining positional encoding and "Talking Head" components to capture local and global contextual information. In our comprehensive experimental analysis comparing various state-of-the-art models on our proposed, our enhanced model demonstrated superior performance in detecting large-sized road cracks with an Average Precision (AP) of 0.458 and maintained competitive performance with an overall AP of 0.445. Moreover, we also provide results on the CrackTinyNet dataset; our model achieved around a 0.21 increase in performance. The code, model weights, dataset, and our results are available on \href{https://github.com/msaqib17/Road_Damage_Detection}{https://github.com/msaqib17/Road\_Damage\_Detection}.

RDD4D: 4D Attention-Guided Road Damage Detection And Classification

TL;DR

This work tackles the lack of diverse, multi-type road-damage benchmarks by introducing the Diverse Road Damage Dataset (DRDD) and a 4D attention-enhanced detector, RDD4D. RDD4D extends a RTMDet-based backbone with Attention4D blocks to refine features across scales, enabling improved detection of large and dense road damages. The approach achieves state-of-the-art performance on DRDD (AP up to 0.458 for large damages; overall AP 0.445) and strong results on CrackTinyNet (MAP@.5 ≈ 0.825) with high recall. The combination of a challenging dataset, dynamic soft-label assignment in training, and multi-scale attention yields notable practical impact for automated road maintenance and scalable infrastructure monitoring.

Abstract

Road damage detection and assessment are crucial components of infrastructure maintenance. However, current methods often struggle with detecting multiple types of road damage in a single image, particularly at varying scales. This is due to the lack of road datasets with various damage types having varying scales. To overcome this deficiency, first, we present a novel dataset called Diverse Road Damage Dataset (DRDD) for road damage detection that captures the diverse road damage types in individual images, addressing a crucial gap in existing datasets. Then, we provide our model, RDD4D, that exploits Attention4D blocks, enabling better feature refinement across multiple scales. The Attention4D module processes feature maps through an attention mechanism combining positional encoding and "Talking Head" components to capture local and global contextual information. In our comprehensive experimental analysis comparing various state-of-the-art models on our proposed, our enhanced model demonstrated superior performance in detecting large-sized road cracks with an Average Precision (AP) of 0.458 and maintained competitive performance with an overall AP of 0.445. Moreover, we also provide results on the CrackTinyNet dataset; our model achieved around a 0.21 increase in performance. The code, model weights, dataset, and our results are available on \href{https://github.com/msaqib17/Road_Damage_Detection}{https://github.com/msaqib17/Road\_Damage\_Detection}.
Paper Structure (25 sections, 3 equations, 6 figures, 8 tables)

This paper contains 25 sections, 3 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Major challenges in road damage detection: Representative examples showing various challenges: (a) weather conditions affecting damage visibility and appearance, (b) occlusions from vehicles partially hiding damage areas, (c) diverse damage patterns requiring robust detection capabilities, (d) motion blur from vehicle movement impacting image quality, and (e) instance-level variations in damage characteristics and (d) shadow effect adding complexity to detection.
  • Figure 2: Qualitative comparison of road damage detection using different object detection techniques. The columns present (from left to right) ground-truth annotations, our proposed approach, RTMDet, and YOLOv8, applied to four representative test images. Each row shows the same input image processed by these techniques, enabling direct comparison of detection accuracy and precision. The results demonstrate that our proposed methodology achieves more accurate detection and classification of road damage categories compared to the baseline RTMDet and other state-of-the-art approaches.
  • Figure 3: Our proposed architecture: The flow of feature maps through the neck section of the object detection network. Attention4D blocks (highlighted in bold green) are strategically applied to the feature maps from the backbone, enhancing spatial and channel-wise information. The neck processes these enhanced features through top-down and bottom-up paths, creating a multi-scale feature representation.
  • Figure 4: Precision-recall curves comparing different methods for bounding box prediction across all classes in terms of all-area performance on a) PPYOLOE b) RTMDET c) YOLOV7 d) YOLOV6 e) YOLOV8 and f) Ours. The error analysis methods include C75 (gray), C50 (light gray), Loc (blue), Sim (red), Oth (green), BG (purple), and FN (orange), with their respective average precision scores shown in brackets. (Best viewed in color)
  • Figure 5: The graph presents a ground-truth analysis of bounding box areas, where each bar represents the count of annotations. These annotations are categorized into three distinct groups—small, medium, and large—based on the size of their bounding box areas.
  • ...and 1 more figures