Table of Contents
Fetching ...

Comparative Analysis of YOLOv9, YOLOv10 and RT-DETR for Real-Time Weed Detection

Ahmet Oğuz Saltık, Alicia Allmendinger, Anthony Stein

TL;DR

This work addresses real-time weed detection for precision spraying by benchmarking YOLOv9, YOLOv10, and RT-DETR on field data with three classes (Sugarbeet, Monocot, Dicot) across multiple image resolutions and hardware. It systematically evaluates accuracy using $mAP_{50}$ and $mAP_{50-95}$ alongside inference time, with models trained in PyTorch/Ultralytics and deployed via TensorRT/OpenVINO. Key findings show that larger input sizes generally improve accuracy, but increase latency, with YOLOv9 and YOLOv10 offering favorable speed–accuracy trade-offs and RT-DETR achieving competitive accuracy at higher resolutions at the cost of longer runtimes. The results guide model and input-resolution choices for real-time smart spraying, highlighting practical implications for agricultural robotics and suggesting future directions in SAHI integration, model compression, and cross-dataset validation to improve generalizability.

Abstract

This paper presents a comprehensive evaluation of state-of-the-art object detection models, including YOLOv9, YOLOv10, and RT-DETR, for the task of weed detection in smart-spraying applications focusing on three classes: Sugarbeet, Monocot, and Dicot. The performance of these models is compared based on mean Average Precision (mAP) scores and inference times on different GPU and CPU devices. We consider various model variations, such as nano, small, medium, large alongside different image resolutions (320px, 480px, 640px, 800px, 960px). The results highlight the trade-offs between inference time and detection accuracy, providing valuable insights for selecting the most suitable model for real-time weed detection. This study aims to guide the development of efficient and effective smart spraying systems, enhancing agricultural productivity through precise weed management.

Comparative Analysis of YOLOv9, YOLOv10 and RT-DETR for Real-Time Weed Detection

TL;DR

This work addresses real-time weed detection for precision spraying by benchmarking YOLOv9, YOLOv10, and RT-DETR on field data with three classes (Sugarbeet, Monocot, Dicot) across multiple image resolutions and hardware. It systematically evaluates accuracy using and alongside inference time, with models trained in PyTorch/Ultralytics and deployed via TensorRT/OpenVINO. Key findings show that larger input sizes generally improve accuracy, but increase latency, with YOLOv9 and YOLOv10 offering favorable speed–accuracy trade-offs and RT-DETR achieving competitive accuracy at higher resolutions at the cost of longer runtimes. The results guide model and input-resolution choices for real-time smart spraying, highlighting practical implications for agricultural robotics and suggesting future directions in SAHI integration, model compression, and cross-dataset validation to improve generalizability.

Abstract

This paper presents a comprehensive evaluation of state-of-the-art object detection models, including YOLOv9, YOLOv10, and RT-DETR, for the task of weed detection in smart-spraying applications focusing on three classes: Sugarbeet, Monocot, and Dicot. The performance of these models is compared based on mean Average Precision (mAP) scores and inference times on different GPU and CPU devices. We consider various model variations, such as nano, small, medium, large alongside different image resolutions (320px, 480px, 640px, 800px, 960px). The results highlight the trade-offs between inference time and detection accuracy, providing valuable insights for selecting the most suitable model for real-time weed detection. This study aims to guide the development of efficient and effective smart spraying systems, enhancing agricultural productivity through precise weed management.

Paper Structure

This paper contains 22 sections, 8 figures, 1 table.

Figures (8)

  • Figure 2: Machine Learning workflow as applied in this study.
  • Figure 3: mAP50 vs inference time comparison on Intel Core i9-14900K (32-core) CPU.
  • Figure 4: mAP50-95 vs inference time comparison on Intel Core i9-14900K (32-core) CPU.
  • Figure 5: Comparison of ground truth and prediction results from various models using 960-pixel image resolution.
  • Figure 6: Time distribution analyses for diverse models utilizing 640-pixel image resolution on NVIDIA RTX4090 GPU.
  • ...and 3 more figures