ODverse33: Is the New YOLO Version Always Better? A Multi Domain benchmark from YOLO v5 to v11
Tianyou Jiang, Yang Zhong
TL;DR
ODverse33 tackles whether newer YOLO versions generalize better across domain-specific tasks by evaluating YOLOv5–YOLOv11 on 33 datasets spanning 11 domains with standardized training and COCO-style metrics such as $mAP_{50}$, $mAP_{50-95}$, and per-size $mAP$ ($mAP_{small}$, $mAP_{medium}$, $mAP_{large}$). The paper systematically summarizes innovations from YOLOv1–YOLOv11 and benchmarks post-YOLOv5 variants to quantify domain-dependent gains. Findings show YOLOv11 achieves top overall performance, but certain domains favor earlier versions (e.g., YOLOv9 in industrial/medical) and YOLOv10 can underperform in some cases, underscoring that newer versions are not universally superior. The study provides practical guidance for domain-specific model selection and emphasizes the value of multi-domain benchmarks and openly available resources for real-time object detectors.
Abstract
You Look Only Once (YOLO) models have been widely used for building real-time object detectors across various domains. With the increasing frequency of new YOLO versions being released, key questions arise. Are the newer versions always better than their previous versions? What are the core innovations in each YOLO version and how do these changes translate into real-world performance gains? In this paper, we summarize the key innovations from YOLOv1 to YOLOv11, introduce a comprehensive benchmark called ODverse33, which includes 33 datasets spanning 11 diverse domains (Autonomous driving, Agricultural, Underwater, Medical, Videogame, Industrial, Aerial, Wildlife, Retail, Microscopic, and Security), and explore the practical impact of model improvements in real-world, multi-domain applications through extensive experimental results. We hope this study can provide some guidance to the extensive users of object detection models and give some references for future real-time object detector development.
