Table of Contents
Fetching ...

YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions

Nidhal Jegham, Chan Young Koh, Marwan Abdelatti, Abdeltawab Hendawi

TL;DR

<3-5 sentence high-level summary>This paper addresses the need for a comprehensive, multi-metric benchmark of the YOLO family from v3 to v12 across diverse datasets with challenging object properties. It compares Ultralytics implementations against original versions, analyzes a wide range of models from nano to extra-large, and evaluates metrics including $Precision$, $Recall$, $mAP_{50}$, $mAP_{50-95}$, preprocessing/inference/postprocessing times, GFLOPs, and model size. The study finds that the YOLOv11 family generally offers the best balance of accuracy and efficiency, while YOLOv12 delivers high architectural complexity with limited practical gains, and YOLOv10 excels in speed and resource efficiency. These results inform practitioners on model selection for real-time deployment, constrained hardware, and tasks involving small, large, or densely packed objects, and guide future architectural refinements to optimize latency-accuracy trade-offs.

Abstract

This study presents a comprehensive benchmark analysis of various YOLO (You Only Look Once) algorithms. It represents the first comprehensive experimental evaluation of YOLOv3 to the latest version, YOLOv12, on various object detection challenges. The challenges considered include varying object sizes, diverse aspect ratios, and small-sized objects of a single class, ensuring a comprehensive assessment across datasets with distinct challenges. To ensure a robust evaluation, we employ a comprehensive set of metrics, including Precision, Recall, Mean Average Precision (mAP), Processing Time, GFLOPs count, and Model Size. Our analysis highlights the distinctive strengths and limitations of each YOLO version. For example: YOLOv9 demonstrates substantial accuracy but struggles with detecting small objects and efficiency whereas YOLOv10 exhibits relatively lower accuracy due to architectural choices that affect its performance in overlapping object detection but excels in speed and efficiency. Additionally, the YOLO11 family consistently shows superior performance maintaining a remarkable balance of accuracy and efficiency. However, YOLOv12 delivered underwhelming results, with its complex architecture introducing computational overhead without significant performance gains. These results provide critical insights for both industry and academia, facilitating the selection of the most suitable YOLO algorithm for diverse applications and guiding future enhancements.

YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions

TL;DR

<3-5 sentence high-level summary>This paper addresses the need for a comprehensive, multi-metric benchmark of the YOLO family from v3 to v12 across diverse datasets with challenging object properties. It compares Ultralytics implementations against original versions, analyzes a wide range of models from nano to extra-large, and evaluates metrics including , , , , preprocessing/inference/postprocessing times, GFLOPs, and model size. The study finds that the YOLOv11 family generally offers the best balance of accuracy and efficiency, while YOLOv12 delivers high architectural complexity with limited practical gains, and YOLOv10 excels in speed and resource efficiency. These results inform practitioners on model selection for real-time deployment, constrained hardware, and tasks involving small, large, or densely packed objects, and guide future architectural refinements to optimize latency-accuracy trade-offs.

Abstract

This study presents a comprehensive benchmark analysis of various YOLO (You Only Look Once) algorithms. It represents the first comprehensive experimental evaluation of YOLOv3 to the latest version, YOLOv12, on various object detection challenges. The challenges considered include varying object sizes, diverse aspect ratios, and small-sized objects of a single class, ensuring a comprehensive assessment across datasets with distinct challenges. To ensure a robust evaluation, we employ a comprehensive set of metrics, including Precision, Recall, Mean Average Precision (mAP), Processing Time, GFLOPs count, and Model Size. Our analysis highlights the distinctive strengths and limitations of each YOLO version. For example: YOLOv9 demonstrates substantial accuracy but struggles with detecting small objects and efficiency whereas YOLOv10 exhibits relatively lower accuracy due to architectural choices that affect its performance in overlapping object detection but excels in speed and efficiency. Additionally, the YOLO11 family consistently shows superior performance maintaining a remarkable balance of accuracy and efficiency. However, YOLOv12 delivered underwhelming results, with its complex architecture introducing computational overhead without significant performance gains. These results provide critical insights for both industry and academia, facilitating the selection of the most suitable YOLO algorithm for diverse applications and guiding future enhancements.

Paper Structure

This paper contains 57 sections, 24 figures, 5 tables.

Figures (24)

  • Figure 1: Evolution of YOLO Algorithms throughout the years
  • Figure 2: Classes Distribution of the Traffic Signs Dataset
  • Figure 3: Classes Distribution of the Africa Wildlife Dataset
  • Figure 4: YOLO versions and scaled versions
  • Figure 5: YOLOv3 architecture showcasing the residual blocks and the upsampling layers yolov3_benchmark
  • ...and 19 more figures