Comprehensive Performance Evaluation of YOLOv12, YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments

Ranjan Sapkota; Zhichao Meng; Martin Churuvija; Xiaoqiang Du; Zenghong Ma; Manoj Karkee

Comprehensive Performance Evaluation of YOLOv12, YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments

Ranjan Sapkota, Zhichao Meng, Martin Churuvija, Xiaoqiang Du, Zenghong Ma, Manoj Karkee

TL;DR

The paper tackles robust in-field fruitlet detection and counting in complex orchard environments by performing a comprehensive, cross-version evaluation of YOLOv8–v12 across 26 configurations using RGB data from an iPhone 14 and RealSense sensors. It systematically measures precision, recall, $mAP@0.50$, and processing speeds, and validates counting accuracy with RMSE and MAE under both smartphone and machine-vision sensing, noting strong sensor-domain effects. Key findings include $mAP@0.50$ peaks around $0.935$ for YOLOv9 Gelan-e/base, recall up to $0.900$ for YOLOv12l, and ultra-fast inference with YOLOv11n at $2.4$ ms, while counting validation highlights YOLOv11n as the most accurate across varieties, with domain adaptation improving performance on different sensors. The work demonstrates the practical potential of fast, sensor-robust YOLO configurations for real-time fruit detection and counting in precision agriculture, while also outlining avenues for domain adaptation and sensor-diverse training to enhance generalizability.

Abstract

This study systematically performed an extensive real-world evaluation of the performances of all configurations of YOLOv8, YOLOv9, YOLOv10, YOLO11( or YOLOv11), and YOLOv12 object detection algorithms in terms of precision, recall, mean Average Precision at 50\% Intersection over Union (mAP@50), and computational speeds including pre-processing, inference, and post-processing times immature green apple (or fruitlet) detection in commercial orchards. Additionally, this research performed and validated in-field counting of the fruitlets using an iPhone and machine vision sensors. Among the configurations, YOLOv12l recorded the highest recall rate at 0.90, compared to all other configurations of YOLO models. Likewise, YOLOv10x achieved the highest precision score of 0.908, while YOLOv9 Gelan-c attained a precision of 0.903. Analysis of mAP@0.50 revealed that YOLOv9 Gelan-base and YOLOv9 Gelan-e reached peak scores of 0.935, with YOLO11s and YOLOv12l following closely at 0.933 and 0.931, respectively. For counting validation using images captured with an iPhone 14 Pro, the YOLO11n configuration demonstrated outstanding accuracy, recording RMSE values of 4.51 for Honeycrisp, 4.59 for Cosmic Crisp, 4.83 for Scilate, and 4.96 for Scifresh; corresponding MAE values were 4.07, 3.98, 7.73, and 3.85. Similar performance trends were observed with RGB-D sensor data. Moreover, sensor-specific training on Intel Realsense data significantly enhanced model performance. YOLOv11n achieved highest inference speed of 2.4 ms, outperforming YOLOv8n (4.1 ms), YOLOv9 Gelan-s (11.5 ms), YOLOv10n (5.5 ms), and YOLOv12n (4.6 ms), underscoring its suitability for real-time object detection applications. (YOLOv12 architecture, YOLOv11 Architecture, YOLOv12 object detection, YOLOv11 object detecion, YOLOv12 segmentation)

Comprehensive Performance Evaluation of YOLOv12, YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments

TL;DR

, and processing speeds, and validates counting accuracy with RMSE and MAE under both smartphone and machine-vision sensing, noting strong sensor-domain effects. Key findings include

peaks around

for YOLOv9 Gelan-e/base, recall up to

for YOLOv12l, and ultra-fast inference with YOLOv11n at

ms, while counting validation highlights YOLOv11n as the most accurate across varieties, with domain adaptation improving performance on different sensors. The work demonstrates the practical potential of fast, sensor-robust YOLO configurations for real-time fruit detection and counting in precision agriculture, while also outlining avenues for domain adaptation and sensor-diverse training to enhance generalizability.

Abstract

Paper Structure (16 sections, 7 equations, 7 figures, 3 tables)

This paper contains 16 sections, 7 equations, 7 figures, 3 tables.

Introduction
Evolution of Object Detection and the Emergence of YOLO
Recent YOLO Iterations (YOLOv8 to YOLOv12)
Objectives
Methods
Study Site and Data Acquisition
Data Preparation and Model Training
Performance Evaluation
In-Field Counting Validation
Results and Discussion
Assessment of Detection Accuracy: Precision and Recall Metrics
Evaluation of Detection Consistency: Mean Average Precision at IoU=0.50
Analysis of Computational Efficiency: Image Processing Speed
Field Validation of Counting Accuracy: RMSE and MAE Metrics
Discussion
...and 1 more sections

Figures (7)

Figure 1: Timeline diagram depicting the evolution of YOLO algorithms from YOLOv1’s grid-based detection to YOLOv12’s attention-centric architecture sapkota2024yolov12
Figure 4: YOLO11 and YOLOv12 Architecture Diagram
Figure 5: Illustration of Object Detection (Green Fruitlet) Results of YOLOv12, YOLO11, YOLOv10, YOLOv9, YOLOv10
Figure 6: mAP@50 scores for all tested configurations of YOLOv8, YOLOv9, YOLOv10, YOLO11, and YOLOv12 models
Figure 7: Illustration of In-Field Counting and Validation for Green Fruit Detection by YOLOv8, YOLOv9, YOLOv10, YOLO11 and YOLOV12 configurations
...and 2 more figures

Comprehensive Performance Evaluation of YOLOv12, YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments

TL;DR

Abstract

Comprehensive Performance Evaluation of YOLOv12, YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (7)