Table of Contents
Fetching ...

SL-YOLO: A Stronger and Lighter Drone Target Detection Model

Defan Chen, Luchan Zhang

TL;DR

SL-YOLO tackles the challenge of small-target detection in drone imagery by augmenting YOLOv8s with a Hierarchical Extended Path Aggregation Network (HEPAN) for enhanced cross-scale feature fusion and incorporating lightweight modules C2fDCB and SCDown, along with a dedicated small-target detection head. The method yields substantial accuracy gains on VisDrone2019, with mAP$_{0.5}$ rising from $43.0\%$ to $46.9\%$ and mAP$_{0.5:0.95}$ from $26.0\%$ to $28.9\%$, while reducing parameters from $11.1\mathrm{M}$ to $9.6\mathrm{M}$ and achieving $132$ FPS. The ablation study confirms that each component (P2 head, HEPAN, C2fDCB, SCDown) contributes to improved detection at small scales, at some trade-offs in GFLOPs. Overall, SL-YOLO demonstrates a favorable accuracy-speed-parameter balance, making it well-suited for real-time drone monitoring and disaster-response applications; future work will pursue greater cross-scenario robustness and adaptability to diverse drone scenarios.

Abstract

Detecting small objects in complex scenes, such as those captured by drones, is a daunting challenge due to the difficulty in capturing the complex features of small targets. While the YOLO family has achieved great success in large target detection, its performance is less than satisfactory when faced with small targets. Because of this, this paper proposes a revolutionary model SL-YOLO (Stronger and Lighter YOLO) that aims to break the bottleneck of small target detection. We propose the Hierarchical Extended Path Aggregation Network (HEPAN), a pioneering cross-scale feature fusion method that can ensure unparalleled detection accuracy even in the most challenging environments. At the same time, without sacrificing detection capabilities, we design the C2fDCB lightweight module and add the SCDown downsampling module to greatly reduce the model's parameters and computational complexity. Our experimental results on the VisDrone2019 dataset reveal a significant improvement in performance, with mAP@0.5 jumping from 43.0% to 46.9% and mAP@0.5:0.95 increasing from 26.0% to 28.9%. At the same time, the model parameters are reduced from 11.1M to 9.6M, and the FPS can reach 132, making it an ideal solution for real-time small object detection in resource-constrained environments.

SL-YOLO: A Stronger and Lighter Drone Target Detection Model

TL;DR

SL-YOLO tackles the challenge of small-target detection in drone imagery by augmenting YOLOv8s with a Hierarchical Extended Path Aggregation Network (HEPAN) for enhanced cross-scale feature fusion and incorporating lightweight modules C2fDCB and SCDown, along with a dedicated small-target detection head. The method yields substantial accuracy gains on VisDrone2019, with mAP rising from to and mAP from to , while reducing parameters from to and achieving FPS. The ablation study confirms that each component (P2 head, HEPAN, C2fDCB, SCDown) contributes to improved detection at small scales, at some trade-offs in GFLOPs. Overall, SL-YOLO demonstrates a favorable accuracy-speed-parameter balance, making it well-suited for real-time drone monitoring and disaster-response applications; future work will pursue greater cross-scenario robustness and adaptability to diverse drone scenarios.

Abstract

Detecting small objects in complex scenes, such as those captured by drones, is a daunting challenge due to the difficulty in capturing the complex features of small targets. While the YOLO family has achieved great success in large target detection, its performance is less than satisfactory when faced with small targets. Because of this, this paper proposes a revolutionary model SL-YOLO (Stronger and Lighter YOLO) that aims to break the bottleneck of small target detection. We propose the Hierarchical Extended Path Aggregation Network (HEPAN), a pioneering cross-scale feature fusion method that can ensure unparalleled detection accuracy even in the most challenging environments. At the same time, without sacrificing detection capabilities, we design the C2fDCB lightweight module and add the SCDown downsampling module to greatly reduce the model's parameters and computational complexity. Our experimental results on the VisDrone2019 dataset reveal a significant improvement in performance, with mAP@0.5 jumping from 43.0% to 46.9% and mAP@0.5:0.95 increasing from 26.0% to 28.9%. At the same time, the model parameters are reduced from 11.1M to 9.6M, and the FPS can reach 132, making it an ideal solution for real-time small object detection in resource-constrained environments.

Paper Structure

This paper contains 14 sections, 1 equation, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Comparison of detection results between YOLOv8s and SL-YOLO model.
  • Figure 2: The overall structure of our SL-YOLO model.
  • Figure 3: The schematic diagrams of network structures: (a) PANet; (b) BiFPN; (c) HEPAN.
  • Figure 4: The overall structure of the C2fDCB module.