Table of Contents
Fetching ...

ODverse33: Is the New YOLO Version Always Better? A Multi Domain benchmark from YOLO v5 to v11

Tianyou Jiang, Yang Zhong

TL;DR

ODverse33 tackles whether newer YOLO versions generalize better across domain-specific tasks by evaluating YOLOv5–YOLOv11 on 33 datasets spanning 11 domains with standardized training and COCO-style metrics such as $mAP_{50}$, $mAP_{50-95}$, and per-size $mAP$ ($mAP_{small}$, $mAP_{medium}$, $mAP_{large}$). The paper systematically summarizes innovations from YOLOv1–YOLOv11 and benchmarks post-YOLOv5 variants to quantify domain-dependent gains. Findings show YOLOv11 achieves top overall performance, but certain domains favor earlier versions (e.g., YOLOv9 in industrial/medical) and YOLOv10 can underperform in some cases, underscoring that newer versions are not universally superior. The study provides practical guidance for domain-specific model selection and emphasizes the value of multi-domain benchmarks and openly available resources for real-time object detectors.

Abstract

You Look Only Once (YOLO) models have been widely used for building real-time object detectors across various domains. With the increasing frequency of new YOLO versions being released, key questions arise. Are the newer versions always better than their previous versions? What are the core innovations in each YOLO version and how do these changes translate into real-world performance gains? In this paper, we summarize the key innovations from YOLOv1 to YOLOv11, introduce a comprehensive benchmark called ODverse33, which includes 33 datasets spanning 11 diverse domains (Autonomous driving, Agricultural, Underwater, Medical, Videogame, Industrial, Aerial, Wildlife, Retail, Microscopic, and Security), and explore the practical impact of model improvements in real-world, multi-domain applications through extensive experimental results. We hope this study can provide some guidance to the extensive users of object detection models and give some references for future real-time object detector development.

ODverse33: Is the New YOLO Version Always Better? A Multi Domain benchmark from YOLO v5 to v11

TL;DR

ODverse33 tackles whether newer YOLO versions generalize better across domain-specific tasks by evaluating YOLOv5–YOLOv11 on 33 datasets spanning 11 domains with standardized training and COCO-style metrics such as , , and per-size (, , ). The paper systematically summarizes innovations from YOLOv1–YOLOv11 and benchmarks post-YOLOv5 variants to quantify domain-dependent gains. Findings show YOLOv11 achieves top overall performance, but certain domains favor earlier versions (e.g., YOLOv9 in industrial/medical) and YOLOv10 can underperform in some cases, underscoring that newer versions are not universally superior. The study provides practical guidance for domain-specific model selection and emphasizes the value of multi-domain benchmarks and openly available resources for real-time object detectors.

Abstract

You Look Only Once (YOLO) models have been widely used for building real-time object detectors across various domains. With the increasing frequency of new YOLO versions being released, key questions arise. Are the newer versions always better than their previous versions? What are the core innovations in each YOLO version and how do these changes translate into real-world performance gains? In this paper, we summarize the key innovations from YOLOv1 to YOLOv11, introduce a comprehensive benchmark called ODverse33, which includes 33 datasets spanning 11 diverse domains (Autonomous driving, Agricultural, Underwater, Medical, Videogame, Industrial, Aerial, Wildlife, Retail, Microscopic, and Security), and explore the practical impact of model improvements in real-world, multi-domain applications through extensive experimental results. We hope this study can provide some guidance to the extensive users of object detection models and give some references for future real-time object detector development.

Paper Structure

This paper contains 12 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Evaluation results of YOLOv5 to YOLOv11. (a) Performance on the COCO validation set (reported in their original projects) and on our ODverse33 validation and test sets. (b) Performance on small, medium, and large objects in the ODverse33 test set. (c) Inference speed per image using a single NVIDIA A100 GPU and number of parameters for each YOLO model, where the size of the circles represents the product of these two metrics.
  • Figure 2: Timeline of YOLO's development, reflecting the earliest release time of their code repository or pre-print.
  • Figure 3: Sample images in 11 domains of the ODverse33 benchmark.
  • Figure 4: Comparison of models developed by different teams.