Robotic Grasping of Harvested Tomato Trusses Using Vision and Online Learning
Luuk van den Bent, Tomás Coleman, Robert Babuška
TL;DR
This work tackles automated grasping of harvested tomato trusses from cluttered crates by integrating a three-stage perception pipeline (detection, grasp-pose identification, and ranking) with an online-learning framework to select robust peduncle grasps. A YOLO-based detector localizes unobstructed trusses, a modified Yolov7-Pose model proposes candidate grasp poses from close-up RGB views, and an autoencoder-KNN system ranks these poses while updating online from grasp outcomes. Extensive lab experiments with a Panda robot and an eye-in-hand RGB-D camera demonstrate strong performance, including 100% pile clearance with retries and 93% first-attempt success, highlighting the method’s potential for industrial automation in post-harvest handling. The results indicate that learning-based grasp pose ranking and perception simplification can achieve reliable, scalable tomato-truss manipulation, with future work focusing on better grippers, collision awareness, and generalization to other produce.
Abstract
Currently, truss tomato weighing and packaging require significant manual work. The main obstacle to automation lies in the difficulty of developing a reliable robotic grasping system for already harvested trusses. We propose a method to grasp trusses that are stacked in a crate with considerable clutter, which is how they are commonly stored and transported after harvest. The method consists of a deep learning-based vision system to first identify the individual trusses in the crate and then determine a suitable grasping location on the stem. To this end, we have introduced a grasp pose ranking algorithm with online learning capabilities. After selecting the most promising grasp pose, the robot executes a pinch grasp without needing touch sensors or geometric models. Lab experiments with a robotic manipulator equipped with an eye-in-hand RGB-D camera showed a 100% clearance rate when tasked to pick all trusses from a pile. 93% of the trusses were successfully grasped on the first try, while the remaining 7% required more attempts.
