SOD-YOLOv8 -- Enhancing YOLOv8 for Small Object Detection in Traffic Scenes

Boshra Khalili; Andrew W. Smyth

SOD-YOLOv8 -- Enhancing YOLOv8 for Small Object Detection in Traffic Scenes

Boshra Khalili, Andrew W. Smyth

TL;DR

The paper tackles the difficulty of small-object detection in traffic and UAV imagery by extending YOLOv8 with a GFPN-inspired multilevel feature fusion, an additional high-resolution detection layer, and a new C2f-EMA attention module. It also replaces CIoU with Powerful-IoU (PIoU) for bounding-box regression to enhance convergence and stability without increasing computation significantly. Key contributions include the Efficient-RepGFPN-inspired feature fusion, the C2f-EMA attention mechanism, and the PIoU loss, validated on VisDrone2019 and real-world traffic scenes where recall, precision, and mean average precision at IoU thresholds improve notably (e.g., $Recall: 40.1\%\to 43.9\%$, $Precision: 51.2\%\to 53.9\%$, $mAP_{0.5}: 40.6\%\to 45.1\%$, $mAP_{0.5:0.95}: 24\%\to 26.6\%$). The approach achieves strong small-object detection with modest latency, making it suitable for UAV-based traffic monitoring and smart-city applications. Future work will probe PIoU generalization across datasets and robustness under diverse and adverse conditions.

Abstract

Object detection as part of computer vision can be crucial for traffic management, emergency response, autonomous vehicles, and smart cities. Despite significant advances in object detection, detecting small objects in images captured by distant cameras remains challenging due to their size, distance from the camera, varied shapes, and cluttered backgrounds. To address these challenges, we propose Small Object Detection YOLOv8 (SOD-YOLOv8), a novel model specifically designed for scenarios involving numerous small objects. Inspired by Efficient Generalized Feature Pyramid Networks (GFPN), we enhance multi-path fusion within YOLOv8 to integrate features across different levels, preserving details from shallower layers and improving small object detection accuracy. Also, A fourth detection layer is added to leverage high-resolution spatial information effectively. The Efficient Multi-Scale Attention Module (EMA) in the C2f-EMA module enhances feature extraction by redistributing weights and prioritizing relevant features. We introduce Powerful-IoU (PIoU) as a replacement for CIoU, focusing on moderate-quality anchor boxes and adding a penalty based on differences between predicted and ground truth bounding box corners. This approach simplifies calculations, speeds up convergence, and enhances detection accuracy. SOD-YOLOv8 significantly improves small object detection, surpassing widely used models in various metrics, without substantially increasing computational cost or latency compared to YOLOv8s. Specifically, it increases recall from 40.1\% to 43.9\%, precision from 51.2\% to 53.9\%, $\text{mAP}_{0.5}$ from 40.6\% to 45.1\%, and $\text{mAP}_{0.5:0.95}$ from 24\% to 26.6\%. In dynamic real-world traffic scenes, SOD-YOLOv8 demonstrated notable improvements in diverse conditions, proving its reliability and effectiveness in detecting small objects even in challenging environments.

SOD-YOLOv8 -- Enhancing YOLOv8 for Small Object Detection in Traffic Scenes

TL;DR

). The approach achieves strong small-object detection with modest latency, making it suitable for UAV-based traffic monitoring and smart-city applications. Future work will probe PIoU generalization across datasets and robustness under diverse and adverse conditions.

Abstract

from 40.6\% to 45.1\%, and

from 24\% to 26.6\%. In dynamic real-world traffic scenes, SOD-YOLOv8 demonstrated notable improvements in diverse conditions, proving its reliability and effectiveness in detecting small objects even in challenging environments.

Paper Structure (21 sections, 7 equations, 14 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 7 equations, 14 figures, 6 tables, 1 algorithm.

INTRODUCTION
Related work
Introduction of YOLOv8 Detection Network
Backbone Layer
Neck Layer
Detection Head Layer
Method
Improved GFPN for Multilevel Feature Integration
Embedding Efficient Multi-scale Attention Mechanism in C2f
Improved Bounding Box Loss Function
Results
Dataset
Experimental Environment and Training Strategies
Evaluation metrics
Experiment Results
...and 6 more sections

Figures (14)

Figure 1: The network structure of YOLOv8.
Figure 2: Proposed improved YOLOv8 for small object detection
Figure 3: skip-layer links: (a) dense-link: concatenates features from all preceding layers; (b) $\log_2n$-link: concatenates features from up to $\log_2(l)+1$ layers at each level.
Figure 4: Different Feature Pyramid Network designs: (a) FPN uses a top-down strategy; (b) PANet enhances FPN with a bottom-up pathway; (c) BiFPN integrates cross-scale pathways bidirectionally; (d) GFPN includes a queen-fusion style pathway and skip-layer connections.
Figure 5: Enhanced and efficient GPFN structure
...and 9 more figures

SOD-YOLOv8 -- Enhancing YOLOv8 for Small Object Detection in Traffic Scenes

TL;DR

Abstract

SOD-YOLOv8 -- Enhancing YOLOv8 for Small Object Detection in Traffic Scenes

Authors

TL;DR

Abstract

Table of Contents

Figures (14)