Precision Harvesting in Cluttered Environments: Integrating End Effector Design with Dual Camera Perception
Kendall Koe, Poojan Kalpeshbhai Shah, Benjamin Walt, Jordan Westphal, Samhita Marri, Shivani Kamtikar, James Seungbum Nam, Naveen Kumar Uppalapati, Girish Krishnan, Girish Chowdhary
TL;DR
The paper tackles automated fruit harvesting in cluttered high-tunnel environments where traditional large-form-factor robots struggle due to occlusions and limited visibility. It introduces Detect2Grasp, a co-designed system combining a collocated eye-in-hand camera with a global RGB-D detector (Yolov7) and a dual-camera perception pipeline to enable closed-loop visual servoing. Field and lab experiments show 85% success with an average reach time of 10.98 s in high tunnels, with robustness to lighting and depth noise, outperforming distal-camera baselines in clutter. The work demonstrates that integrating compact hardware with reinforced visual feedback can achieve reliable, low-profile harvesting suitable for dense agricultural canopies.
Abstract
Due to labor shortages in specialty crop industries, a need for robotic automation to increase agricultural efficiency and productivity has arisen. Previous manipulation systems perform well in harvesting in uncluttered and structured environments. High tunnel environments are more compact and cluttered in nature, requiring a rethinking of the large form factor systems and grippers. We propose a novel codesigned framework incorporating a global detection camera and a local eye-in-hand camera that demonstrates precise localization of small fruits via closed-loop visual feedback and reliable error handling. Field experiments in high tunnels show our system can reach an average of 85.0\% of cherry tomato fruit in 10.98s on average.
