Table of Contents
Fetching ...

Robotic Grasping of Harvested Tomato Trusses Using Vision and Online Learning

Luuk van den Bent, Tomás Coleman, Robert Babuška

TL;DR

This work tackles automated grasping of harvested tomato trusses from cluttered crates by integrating a three-stage perception pipeline (detection, grasp-pose identification, and ranking) with an online-learning framework to select robust peduncle grasps. A YOLO-based detector localizes unobstructed trusses, a modified Yolov7-Pose model proposes candidate grasp poses from close-up RGB views, and an autoencoder-KNN system ranks these poses while updating online from grasp outcomes. Extensive lab experiments with a Panda robot and an eye-in-hand RGB-D camera demonstrate strong performance, including 100% pile clearance with retries and 93% first-attempt success, highlighting the method’s potential for industrial automation in post-harvest handling. The results indicate that learning-based grasp pose ranking and perception simplification can achieve reliable, scalable tomato-truss manipulation, with future work focusing on better grippers, collision awareness, and generalization to other produce.

Abstract

Currently, truss tomato weighing and packaging require significant manual work. The main obstacle to automation lies in the difficulty of developing a reliable robotic grasping system for already harvested trusses. We propose a method to grasp trusses that are stacked in a crate with considerable clutter, which is how they are commonly stored and transported after harvest. The method consists of a deep learning-based vision system to first identify the individual trusses in the crate and then determine a suitable grasping location on the stem. To this end, we have introduced a grasp pose ranking algorithm with online learning capabilities. After selecting the most promising grasp pose, the robot executes a pinch grasp without needing touch sensors or geometric models. Lab experiments with a robotic manipulator equipped with an eye-in-hand RGB-D camera showed a 100% clearance rate when tasked to pick all trusses from a pile. 93% of the trusses were successfully grasped on the first try, while the remaining 7% required more attempts.

Robotic Grasping of Harvested Tomato Trusses Using Vision and Online Learning

TL;DR

This work tackles automated grasping of harvested tomato trusses from cluttered crates by integrating a three-stage perception pipeline (detection, grasp-pose identification, and ranking) with an online-learning framework to select robust peduncle grasps. A YOLO-based detector localizes unobstructed trusses, a modified Yolov7-Pose model proposes candidate grasp poses from close-up RGB views, and an autoencoder-KNN system ranks these poses while updating online from grasp outcomes. Extensive lab experiments with a Panda robot and an eye-in-hand RGB-D camera demonstrate strong performance, including 100% pile clearance with retries and 93% first-attempt success, highlighting the method’s potential for industrial automation in post-harvest handling. The results indicate that learning-based grasp pose ranking and perception simplification can achieve reliable, scalable tomato-truss manipulation, with future work focusing on better grippers, collision awareness, and generalization to other produce.

Abstract

Currently, truss tomato weighing and packaging require significant manual work. The main obstacle to automation lies in the difficulty of developing a reliable robotic grasping system for already harvested trusses. We propose a method to grasp trusses that are stacked in a crate with considerable clutter, which is how they are commonly stored and transported after harvest. The method consists of a deep learning-based vision system to first identify the individual trusses in the crate and then determine a suitable grasping location on the stem. To this end, we have introduced a grasp pose ranking algorithm with online learning capabilities. After selecting the most promising grasp pose, the robot executes a pinch grasp without needing touch sensors or geometric models. Lab experiments with a robotic manipulator equipped with an eye-in-hand RGB-D camera showed a 100% clearance rate when tasked to pick all trusses from a pile. 93% of the trusses were successfully grasped on the first try, while the remaining 7% required more attempts.
Paper Structure (24 sections, 1 equation, 7 figures, 2 tables)

This paper contains 24 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Harvested tomato trusses are stacked in a crate before they enter the packaging process.
  • Figure 2: Suitable grasp poses on the peduncle for grasping a tomato truss. The yellow dots represent the positions, and the purple rectangles indicate the orientations of the grasps.
  • Figure 3: Overview of the method. First, the truss to be grasped is detected (steps A). The robot arm then moves the end-effctor with camera above this truss to take a close-up image, in which suitable grasp poses on the peduncle are identified (step B). Finally, the grasp pose ranking algorithm finds the most suitable pose (step C) and the robot executes the grasp. Based on the the grasp success or failure, the ranking model is adapted (dashed line in step C).
  • Figure 4: Example of the grasp pose identification network evaluation. The ground truth grasp poses are shown as orange dots with a green line for the orientation whilst the predictions are shown in blue. The green circles show the distance threshold in which the position prediction has to be located to be considered correct.
  • Figure 5: Boxplots displaying the distance and angle errors of the correctly predicted keypoints on the validation set of the grasp pose identification network.
  • ...and 2 more figures