Table of Contents
Fetching ...

Real-time object detection and robotic manipulation for agriculture using a YOLO-based learning approach

Hongyu Zhao, Zezhi Tang, Zhenhong Li, Yi Dong, Yuancheng Si, Mingyang Lu, George Panoutsos

TL;DR

This work tackles real-time crop detection and robotic grasping for harvest automation by integrating YOLO's fast object detection with VGG16's deep feature extraction for grasp-point regression, all trained in a simulated agricultural environment. The two-stage pipeline converts camera imagery from 416×416 to 224×224 between stages, enabling rapid localization and precise grasp point estimation for a vacuum-based UR5 manipulator. The key contribution is the coupled YOLO-VGG architecture that achieves real-time performance and improved grasp accuracy in complex scenes, demonstrated within CoppeliaSim. The approach has practical significance for reducing manual labor in agriculture and lays groundwork for robust operation under variable lighting and foliage conditions.

Abstract

The optimisation of crop harvesting processes for commonly cultivated crops is of great importance in the aim of agricultural industrialisation. Nowadays, the utilisation of machine vision has enabled the automated identification of crops, leading to the enhancement of harvesting efficiency, but challenges still exist. This study presents a new framework that combines two separate architectures of convolutional neural networks (CNNs) in order to simultaneously accomplish the tasks of crop detection and harvesting (robotic manipulation) inside a simulated environment. Crop images in the simulated environment are subjected to random rotations, cropping, brightness, and contrast adjustments to create augmented images for dataset generation. The you only look once algorithmic framework is employed with traditional rectangular bounding boxes for crop localization. The proposed method subsequently utilises the acquired image data via a visual geometry group model in order to reveal the grasping positions for the robotic manipulation.

Real-time object detection and robotic manipulation for agriculture using a YOLO-based learning approach

TL;DR

This work tackles real-time crop detection and robotic grasping for harvest automation by integrating YOLO's fast object detection with VGG16's deep feature extraction for grasp-point regression, all trained in a simulated agricultural environment. The two-stage pipeline converts camera imagery from 416×416 to 224×224 between stages, enabling rapid localization and precise grasp point estimation for a vacuum-based UR5 manipulator. The key contribution is the coupled YOLO-VGG architecture that achieves real-time performance and improved grasp accuracy in complex scenes, demonstrated within CoppeliaSim. The approach has practical significance for reducing manual labor in agriculture and lays groundwork for robust operation under variable lighting and foliage conditions.

Abstract

The optimisation of crop harvesting processes for commonly cultivated crops is of great importance in the aim of agricultural industrialisation. Nowadays, the utilisation of machine vision has enabled the automated identification of crops, leading to the enhancement of harvesting efficiency, but challenges still exist. This study presents a new framework that combines two separate architectures of convolutional neural networks (CNNs) in order to simultaneously accomplish the tasks of crop detection and harvesting (robotic manipulation) inside a simulated environment. Crop images in the simulated environment are subjected to random rotations, cropping, brightness, and contrast adjustments to create augmented images for dataset generation. The you only look once algorithmic framework is employed with traditional rectangular bounding boxes for crop localization. The proposed method subsequently utilises the acquired image data via a visual geometry group model in order to reveal the grasping positions for the robotic manipulation.
Paper Structure (18 sections, 4 equations, 9 figures)

This paper contains 18 sections, 4 equations, 9 figures.

Figures (9)

  • Figure 1: CNN and FC structure.
  • Figure 2: Procedure of 2-D CNN.
  • Figure 3: Similation Environment
  • Figure 4: YOLO Structure
  • Figure 5: Training Result of YOLO
  • ...and 4 more figures