YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems
Chien-Yao Wang, Hong-Yuan Mark Liao
TL;DR
This survey traces the decade-long evolution of the YOLO family from YOLOv1 to YOLOv10, highlighting core design philosophies that enabled real-time, edge-friendly object detection. It analyzes architectural innovations, training techniques, and label-assignment strategies that yielded increasing speed and accuracy, while also enabling cross-domain extensions to tracking, segmentation, driving, pose, 3D perception, and open-vocabulary tasks. The paper underscores YOLO's influence on subsequent CV research and its role as a versatile platform for integrating with transformers, NAS, multimodal models, and lightweight hardware-focused designs. It provides a structured view of how simpler, faster, and stronger YOLO variants have driven practical deployment and inspired broader developments in computer vision and language-model-enabled perception.
Abstract
This is a comprehensive review of the YOLO series of systems. Different from previous literature surveys, this review article re-examines the characteristics of the YOLO series from the latest technical point of view. At the same time, we also analyzed how the YOLO series continued to influence and promote real-time computer vision-related research and led to the subsequent development of computer vision and language models.We take a closer look at how the methods proposed by the YOLO series in the past ten years have affected the development of subsequent technologies and show the applications of YOLO in various fields. We hope this article can play a good guiding role in subsequent real-time computer vision development.
