Table of Contents
Fetching ...

Vision-guided Autonomous Dual-arm Extraction Robot for Bell Pepper Harvesting

Kshitij Madhav Bhat, Tom Gao, Abhishek Mathur, Rohit Satishkumar, Francisco Yandun, Dominik Bauer, Nancy Pollard

Abstract

Agricultural robotics has emerged as a critical solution to the labor shortages and rising costs associated with manual crop harvesting. Bell pepper harvesting, in particular, is a labor-intensive task, accounting for up to 50% of total production costs. While automated solutions have shown promise in controlled greenhouse environments, harvesting in unstructured outdoor farms remains an open challenge due to environmental variability and occlusion. This paper presents VADER (Vision-guided Autonomous Dual-arm Extraction Robot), a dual-arm mobile manipulation system designed specifically for the autonomous harvesting of bell peppers in outdoor environments. The system integrates a robust perception pipeline coupled with a dual-arm planning framework that coordinates a gripping arm and a cutting arm for extraction. We validate the system through trials in various realistic conditions, demonstrating a harvest success rate exceeding 60% with a cycle time of under 100 seconds per fruit, while also featuring a teleoperation fail-safe based on the GELLO teleoperation framework to ensure robustness. To support robust perception, we contribute a hierarchically structured dataset of over 3,200 images spanning indoor and outdoor domains, pairing wide-field scene images with close-up pepper images to enable a coarse-to-fine training strategy from fruit detection to high-precision pose estimation. The code and dataset will be made publicly available upon acceptance.

Vision-guided Autonomous Dual-arm Extraction Robot for Bell Pepper Harvesting

Abstract

Agricultural robotics has emerged as a critical solution to the labor shortages and rising costs associated with manual crop harvesting. Bell pepper harvesting, in particular, is a labor-intensive task, accounting for up to 50% of total production costs. While automated solutions have shown promise in controlled greenhouse environments, harvesting in unstructured outdoor farms remains an open challenge due to environmental variability and occlusion. This paper presents VADER (Vision-guided Autonomous Dual-arm Extraction Robot), a dual-arm mobile manipulation system designed specifically for the autonomous harvesting of bell peppers in outdoor environments. The system integrates a robust perception pipeline coupled with a dual-arm planning framework that coordinates a gripping arm and a cutting arm for extraction. We validate the system through trials in various realistic conditions, demonstrating a harvest success rate exceeding 60% with a cycle time of under 100 seconds per fruit, while also featuring a teleoperation fail-safe based on the GELLO teleoperation framework to ensure robustness. To support robust perception, we contribute a hierarchically structured dataset of over 3,200 images spanning indoor and outdoor domains, pairing wide-field scene images with close-up pepper images to enable a coarse-to-fine training strategy from fruit detection to high-precision pose estimation. The code and dataset will be made publicly available upon acceptance.
Paper Structure (22 sections, 8 equations, 10 figures, 2 tables)

This paper contains 22 sections, 8 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: VADER dual-arm harvesting approach. (a) Two RGBD cameras scan the scene for peppers. (b) Peppers are detected and their poses are estimated. (c) The gripper and cutter manipulators plan coordinated grasping and cutting motions respectively. (d) The gripper approaches the target pepper. (e) The cutter aligns with the peduncle. (f) The pepper is separated from the plant. (g) The gripper transfers the harvested pepper to the onboard storage bin. (h) Complete system overview of VADER mounted on a mobile platform.
  • Figure 2: Overview of VADER. (a) A Warthog UGV base carrying two UFactory XArm7 manipulators with a tendon-actuated 3-fingered soft gripper and peduncle cutter, which are both composed of 3D printed parts. (b) Autonomous pipeline: palm-mounted RealSense D405 cameras feed YOLOv11 segmentation and superellipsoid pose estimation into a collision-aware motion planner, coordinating gripper and cutter for reliable harvest and transfer to storage. (c) GELLO-based dual-arm teleoperation enables failure recovery.
  • Figure 3: Pepper pose estimation: We use a fitted superellipsoid to refine the position of the pepper and its orientation based on the peduncle axis.
  • Figure 4: Parametric Circle Grasp Selection: Points are sampled on the circle perpendicular to the pepper axis. The distance from the robot base for each point is minimized.
  • Figure 5: Overall harvesting logic and planning strategies. Green highlighted motions are parallelized in terms of planning and execution.
  • ...and 5 more figures