Table of Contents
Fetching ...

Detection of On-Ground Chestnuts Using Artificial Intelligence Toward Automated Picking

Kaixuan Fang, Yuzhen Lu, Xinyang Mu

TL;DR

This study tackles the problem of detecting chestnuts on orchard floors to support vision-guided autonomous harvesting for small- to mid-scale producers. It builds and annotates a 319-image dataset (6,524 chestnut instances) from commercial Michigan orchards and benchmark compares eight real-time detectors across YOLOv11–v13 and RT-DETRv1–v4, using Monte Carlo cross-validation and 200 training epochs on 1024×1024 inputs. The results show that YOLOv11 variants offer the best balance of accuracy and speed, with YOLOv12-m achieving the highest mAP@0.5 and YOLOv11-x delivering strong localization under strict IoU. RT-DETR models achieve competitive precision but are generally slower, with RT-DETRv2-R101 performing best among them; overall, YOLO-based detectors are more suitable for on-board, real-time chestnut detection in orchard environments. The publicly available dataset and code provide a benchmark for future research and a foundation for developing vision-guided chestnut-picking systems for smallholders.

Abstract

Traditional mechanized chestnut harvesting is too costly for small producers, non-selective, and prone to damaging nuts. Accurate, reliable detection of chestnuts on the orchard floor is crucial for developing low-cost, vision-guided automated harvesting technology. However, developing a reliable chestnut detection system faces challenges in complex environments with shading, varying natural light conditions, and interference from weeds, fallen leaves, stones, and other foreign on-ground objects, which have remained unaddressed. This study collected 319 images of chestnuts on the orchard floor, containing 6524 annotated chestnuts. A comprehensive set of 29 state-of-the-art real-time object detectors, including 14 in the YOLO (v11-13) and 15 in the RT-DETR (v1-v4) families at varied model scales, was systematically evaluated through replicated modeling experiments for chestnut detection. Experimental results show that the YOLOv12m model achieves the best mAP@0.5 of 95.1% among all the evaluated models, while the RT-DETRv2-R101 was the most accurate variant among RT-DETR models, with mAP@0.5 of 91.1%. In terms of mAP@[0.5:0.95], the YOLOv11x model achieved the best accuracy of 80.1%. All models demonstrate significant potential for real-time chestnut detection, and YOLO models outperformed RT-DETR models in terms of both detection accuracy and inference, making them better suited for on-board deployment. Both the dataset and software programs in this study have been made publicly available at https://github.com/AgFood-Sensing-and-Intelligence-Lab/ChestnutDetection.

Detection of On-Ground Chestnuts Using Artificial Intelligence Toward Automated Picking

TL;DR

This study tackles the problem of detecting chestnuts on orchard floors to support vision-guided autonomous harvesting for small- to mid-scale producers. It builds and annotates a 319-image dataset (6,524 chestnut instances) from commercial Michigan orchards and benchmark compares eight real-time detectors across YOLOv11–v13 and RT-DETRv1–v4, using Monte Carlo cross-validation and 200 training epochs on 1024×1024 inputs. The results show that YOLOv11 variants offer the best balance of accuracy and speed, with YOLOv12-m achieving the highest mAP@0.5 and YOLOv11-x delivering strong localization under strict IoU. RT-DETR models achieve competitive precision but are generally slower, with RT-DETRv2-R101 performing best among them; overall, YOLO-based detectors are more suitable for on-board, real-time chestnut detection in orchard environments. The publicly available dataset and code provide a benchmark for future research and a foundation for developing vision-guided chestnut-picking systems for smallholders.

Abstract

Traditional mechanized chestnut harvesting is too costly for small producers, non-selective, and prone to damaging nuts. Accurate, reliable detection of chestnuts on the orchard floor is crucial for developing low-cost, vision-guided automated harvesting technology. However, developing a reliable chestnut detection system faces challenges in complex environments with shading, varying natural light conditions, and interference from weeds, fallen leaves, stones, and other foreign on-ground objects, which have remained unaddressed. This study collected 319 images of chestnuts on the orchard floor, containing 6524 annotated chestnuts. A comprehensive set of 29 state-of-the-art real-time object detectors, including 14 in the YOLO (v11-13) and 15 in the RT-DETR (v1-v4) families at varied model scales, was systematically evaluated through replicated modeling experiments for chestnut detection. Experimental results show that the YOLOv12m model achieves the best mAP@0.5 of 95.1% among all the evaluated models, while the RT-DETRv2-R101 was the most accurate variant among RT-DETR models, with mAP@0.5 of 91.1%. In terms of mAP@[0.5:0.95], the YOLOv11x model achieved the best accuracy of 80.1%. All models demonstrate significant potential for real-time chestnut detection, and YOLO models outperformed RT-DETR models in terms of both detection accuracy and inference, making them better suited for on-board deployment. Both the dataset and software programs in this study have been made publicly available at https://github.com/AgFood-Sensing-and-Intelligence-Lab/ChestnutDetection.
Paper Structure (19 sections, 10 figures, 3 tables)

This paper contains 19 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Example of original images and annotation images of on-ground chestnuts. The green bounding boxes indicate the labeled chestnuts.
  • Figure 2: The histogram of the annotated chestnuts per image.
  • Figure 3: The proposed pipeline of chestnut detection by YOLO/RT-DETR object detectors.
  • Figure 4: Training curves of mAP@0.5 and mAP@[0.5:0.95] for the YOLO models for chestnut detection.
  • Figure 5: Test image of YOLOv12-m under complex lighting and severe occlusion conditions.
  • ...and 5 more figures