YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision
Muhammad Hussain
TL;DR
This paper surveys the evolution of the YOLO family with a focus on YOLOv5, YOLOv8, and YOLOv10, examining architectural shifts, performance trends, and edge-deployment suitability. It highlights how CSPDarknet-based backbones, PANet-based fusions, and progressive head designs contributed to improved speed-accuracy trade-offs, culminating in YOLOv10’s NMS-free training and inference, spatial-channel decoupling, and large-kernel features. The analysis reveals a clear trajectory toward more efficient, scalable, and edge-friendly detectors, while providing guidance on selecting variants based on hardware constraints and application needs. Overall, the work underscores the practical impact of these advances for real-time vision in resource-constrained environments.
Abstract
This paper presents a comprehensive review of the evolution of the YOLO (You Only Look Once) object detection algorithm, focusing on YOLOv5, YOLOv8, and YOLOv10. We analyze the architectural advancements, performance improvements, and suitability for edge deployment across these versions. YOLOv5 introduced significant innovations such as the CSPDarknet backbone and Mosaic Augmentation, balancing speed and accuracy. YOLOv8 built upon this foundation with enhanced feature extraction and anchor-free detection, improving versatility and performance. YOLOv10 represents a leap forward with NMS-free training, spatial-channel decoupled downsampling, and large-kernel convolutions, achieving state-of-the-art performance with reduced computational overhead. Our findings highlight the progressive enhancements in accuracy, efficiency, and real-time performance, particularly emphasizing their applicability in resource-constrained environments. This review provides insights into the trade-offs between model complexity and detection accuracy, offering guidance for selecting the most appropriate YOLO version for specific edge computing applications.
