Table of Contents
Fetching ...

Precision Harvesting in Cluttered Environments: Integrating End Effector Design with Dual Camera Perception

Kendall Koe, Poojan Kalpeshbhai Shah, Benjamin Walt, Jordan Westphal, Samhita Marri, Shivani Kamtikar, James Seungbum Nam, Naveen Kumar Uppalapati, Girish Krishnan, Girish Chowdhary

TL;DR

The paper tackles automated fruit harvesting in cluttered high-tunnel environments where traditional large-form-factor robots struggle due to occlusions and limited visibility. It introduces Detect2Grasp, a co-designed system combining a collocated eye-in-hand camera with a global RGB-D detector (Yolov7) and a dual-camera perception pipeline to enable closed-loop visual servoing. Field and lab experiments show 85% success with an average reach time of 10.98 s in high tunnels, with robustness to lighting and depth noise, outperforming distal-camera baselines in clutter. The work demonstrates that integrating compact hardware with reinforced visual feedback can achieve reliable, low-profile harvesting suitable for dense agricultural canopies.

Abstract

Due to labor shortages in specialty crop industries, a need for robotic automation to increase agricultural efficiency and productivity has arisen. Previous manipulation systems perform well in harvesting in uncluttered and structured environments. High tunnel environments are more compact and cluttered in nature, requiring a rethinking of the large form factor systems and grippers. We propose a novel codesigned framework incorporating a global detection camera and a local eye-in-hand camera that demonstrates precise localization of small fruits via closed-loop visual feedback and reliable error handling. Field experiments in high tunnels show our system can reach an average of 85.0\% of cherry tomato fruit in 10.98s on average.

Precision Harvesting in Cluttered Environments: Integrating End Effector Design with Dual Camera Perception

TL;DR

The paper tackles automated fruit harvesting in cluttered high-tunnel environments where traditional large-form-factor robots struggle due to occlusions and limited visibility. It introduces Detect2Grasp, a co-designed system combining a collocated eye-in-hand camera with a global RGB-D detector (Yolov7) and a dual-camera perception pipeline to enable closed-loop visual servoing. Field and lab experiments show 85% success with an average reach time of 10.98 s in high tunnels, with robustness to lighting and depth noise, outperforming distal-camera baselines in clutter. The work demonstrates that integrating compact hardware with reinforced visual feedback can achieve reliable, low-profile harvesting suitable for dense agricultural canopies.

Abstract

Due to labor shortages in specialty crop industries, a need for robotic automation to increase agricultural efficiency and productivity has arisen. Previous manipulation systems perform well in harvesting in uncluttered and structured environments. High tunnel environments are more compact and cluttered in nature, requiring a rethinking of the large form factor systems and grippers. We propose a novel codesigned framework incorporating a global detection camera and a local eye-in-hand camera that demonstrates precise localization of small fruits via closed-loop visual feedback and reliable error handling. Field experiments in high tunnels show our system can reach an average of 85.0\% of cherry tomato fruit in 10.98s on average.

Paper Structure

This paper contains 15 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Our custom pneumatic gripper with a camera collocated with the central axis.
  • Figure 2: Flow diagram of our Detect2Grasp algorithm that includes berry detection, initial pose calculation, and visual servoing phases: (a) Using the RGB-D camera to scan the area and identifying berries with a Yolov7 detector. (b) Establishing a plane from the depth camera to the detected berry, aligning an initial pose as detailed in Section \ref{['sec:compute_initial_pose']}, and then moving the manipulator to this pose, recomputing if the berry isn't visible. (c) With the berry visible, it proceeds to visual servoing, employing a Cartesian velocity controller for approach, maintaining berry centering with an inner loop, and defining 'reached' when the berry significantly fills the image.
  • Figure 3: The berries on (a) the periphery and (b) under the canopy are used for testing. Peripheral berries are exposed outward while berries under the canopy are recessed towards the stem of the plant. Berries detected by the base camera are marked with orange.
  • Figure 4: Experimental Setups. (a) Base Setup (b) 13x Lighting (c) 20x Lighting (d) Hanging Vine (e) Outdoor High Tunnel
  • Figure 5: The distal depth camera setup used as a baseline. In experiments with this setup, the RGB camera enclosed in the gripper was not used. This gripper successfully grasps berries on the periphery (a), but more often collides with the canopy due to the larger profile (b).
  • ...and 2 more figures