Table of Contents
Fetching ...

Real-Time Dynamic Scale-Aware Fusion Detection Network: Take Road Damage Detection as an example

Weichao Pan, Xu Wang, Wenqing Huan

TL;DR

A multi-scale, adaptive road damage detection model with the ability to automatically remove background interference, called Dynamic Scale-Aware Fusion Detection Model (RT-DSAFDet) is designed and proposed.

Abstract

Unmanned Aerial Vehicle (UAV)-based Road Damage Detection (RDD) is important for daily maintenance and safety in cities, especially in terms of significantly reducing labor costs. However, current UAV-based RDD research is still faces many challenges. For example, the damage with irregular size and direction, the masking of damage by the background, and the difficulty of distinguishing damage from the background significantly affect the ability of UAV to detect road damage in daily inspection. To solve these problems and improve the performance of UAV in real-time road damage detection, we design and propose three corresponding modules: a feature extraction module that flexibly adapts to shape and background; a module that fuses multiscale perception and adapts to shape and background ; an efficient downsampling module. Based on these modules, we designed a multi-scale, adaptive road damage detection model with the ability to automatically remove background interference, called Dynamic Scale-Aware Fusion Detection Model (RT-DSAFDet). Experimental results on the UAV-PDD2023 public dataset show that our model RT-DSAFDet achieves a mAP50 of 54.2%, which is 11.1% higher than that of YOLOv10-m, an efficient variant of the latest real-time object detection model YOLOv10, while the amount of parameters is reduced to 1.8M and FLOPs to 4.6G, with a decreased by 88% and 93%, respectively. Furthermore, on the large generalized object detection public dataset MS COCO2017 also shows the superiority of our model with mAP50-95 is the same as YOLOv9-t, but with 0.5% higher mAP50, 10% less parameters volume, and 40% less FLOPs.

Real-Time Dynamic Scale-Aware Fusion Detection Network: Take Road Damage Detection as an example

TL;DR

A multi-scale, adaptive road damage detection model with the ability to automatically remove background interference, called Dynamic Scale-Aware Fusion Detection Model (RT-DSAFDet) is designed and proposed.

Abstract

Unmanned Aerial Vehicle (UAV)-based Road Damage Detection (RDD) is important for daily maintenance and safety in cities, especially in terms of significantly reducing labor costs. However, current UAV-based RDD research is still faces many challenges. For example, the damage with irregular size and direction, the masking of damage by the background, and the difficulty of distinguishing damage from the background significantly affect the ability of UAV to detect road damage in daily inspection. To solve these problems and improve the performance of UAV in real-time road damage detection, we design and propose three corresponding modules: a feature extraction module that flexibly adapts to shape and background; a module that fuses multiscale perception and adapts to shape and background ; an efficient downsampling module. Based on these modules, we designed a multi-scale, adaptive road damage detection model with the ability to automatically remove background interference, called Dynamic Scale-Aware Fusion Detection Model (RT-DSAFDet). Experimental results on the UAV-PDD2023 public dataset show that our model RT-DSAFDet achieves a mAP50 of 54.2%, which is 11.1% higher than that of YOLOv10-m, an efficient variant of the latest real-time object detection model YOLOv10, while the amount of parameters is reduced to 1.8M and FLOPs to 4.6G, with a decreased by 88% and 93%, respectively. Furthermore, on the large generalized object detection public dataset MS COCO2017 also shows the superiority of our model with mAP50-95 is the same as YOLOv9-t, but with 0.5% higher mAP50, 10% less parameters volume, and 40% less FLOPs.
Paper Structure (15 sections, 2 equations, 6 figures, 4 tables)

This paper contains 15 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Proposed modeling framework diagram. The RT-DSAFDet model first extracts the image features through the backbone network, and then uses DSAF and up and down sampling to fuse the information of different scales in the multi-scale fusion module.
  • Figure 2: Structure of Flexible Attention module. The FA module differs from ordinary convolutional blocks in that it uses deformable convolution (DCNv2) to accommodate irregularly shaped targets and enhances the capture of key features and the suppression of background noise through a triad attention mechanism.
  • Figure 3: Structure of Dynamic Scale-Aware Fusion module. Compared with common feature extraction module, DSAF module introduces multiple FA modules and enhances the ability of multi-scale feature fusion and expression through parallel processing and feature Concat.
  • Figure 4: Structure of Spatial Downsampling module. Compared with the common Downsampling module, Spatial Downsampling module introduces the combination of maximum pooling and average pooling, so as to retain key information more effectively.
  • Figure 5: Recognition results of UAV-PDD2023 dataset comparing experimental models.
  • ...and 1 more figures