Table of Contents
Fetching ...

Small Object Detection with YOLO: A Performance Analysis Across Model Versions and Hardware

Muhammad Fasih Tariq, Muhammad Azeem Javed

TL;DR

This work addresses small object detection with YOLO across diverse hardware and software backends. It benchmarks YOLOv5, v8, v9, v10, and v11 on CPUs (OpenVINO/IPEx/ONNX backends) and GPUs (TensorRT), using COCO 2017 and DOTAv1.5 datasets with varying image sizes. Key findings show TensorRT delivers the fastest GPU inference due to kernel fusion, while OpenVINO yields the best CPU performance; YOLOv8 and YOLOv9 demonstrate strong small-object detection capabilities on DOTAv1.5, highlighting scale-dependent strengths across models. These insights guide practitioners in selecting model versions and backends under specific hardware constraints to optimize deployment for real-world tasks.

Abstract

This paper provides an extensive evaluation of YOLO object detection models (v5, v8, v9, v10, v11) by com- paring their performance across various hardware platforms and optimization libraries. Our study investigates inference speed and detection accuracy on Intel and AMD CPUs using popular libraries such as ONNX and OpenVINO, as well as on GPUs through TensorRT and other GPU-optimized frameworks. Furthermore, we analyze the sensitivity of these YOLO models to object size within the image, examining performance when detecting objects that occupy 1%, 2.5%, and 5% of the total area of the image. By identifying the trade-offs in efficiency, accuracy, and object size adaptability, this paper offers insights for optimal model selection based on specific hardware constraints and detection requirements, aiding practitioners in deploying YOLO models effectively for real-world applications.

Small Object Detection with YOLO: A Performance Analysis Across Model Versions and Hardware

TL;DR

This work addresses small object detection with YOLO across diverse hardware and software backends. It benchmarks YOLOv5, v8, v9, v10, and v11 on CPUs (OpenVINO/IPEx/ONNX backends) and GPUs (TensorRT), using COCO 2017 and DOTAv1.5 datasets with varying image sizes. Key findings show TensorRT delivers the fastest GPU inference due to kernel fusion, while OpenVINO yields the best CPU performance; YOLOv8 and YOLOv9 demonstrate strong small-object detection capabilities on DOTAv1.5, highlighting scale-dependent strengths across models. These insights guide practitioners in selecting model versions and backends under specific hardware constraints to optimize deployment for real-world tasks.

Abstract

This paper provides an extensive evaluation of YOLO object detection models (v5, v8, v9, v10, v11) by com- paring their performance across various hardware platforms and optimization libraries. Our study investigates inference speed and detection accuracy on Intel and AMD CPUs using popular libraries such as ONNX and OpenVINO, as well as on GPUs through TensorRT and other GPU-optimized frameworks. Furthermore, we analyze the sensitivity of these YOLO models to object size within the image, examining performance when detecting objects that occupy 1%, 2.5%, and 5% of the total area of the image. By identifying the trade-offs in efficiency, accuracy, and object size adaptability, this paper offers insights for optimal model selection based on specific hardware constraints and detection requirements, aiding practitioners in deploying YOLO models effectively for real-world applications.

Paper Structure

This paper contains 9 sections, 4 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: History of the YOLO series.
  • Figure 2: CPU Performance on the AMD & Intel Systems on multiple inference engines.
  • Figure 3: Inference time on RTX 3070 for all models
  • Figure 4: Effect of image size of throughput on CPU
  • Figure 5: Effect of image size on throughput of GPU
  • ...and 1 more figures