Table of Contents
Fetching ...

Find the Fruit: Zero-Shot Sim2Real RL for Occlusion-Aware Plant Manipulation

Nitesh Subedi, Hsin-Jung Yang, Devesh K. Jha, Soumik Sarkar

TL;DR

This work tackles occlusion-heavy fruit harvesting with deformable plants by learning a policy in a high-fidelity FEM-based simulator and transferring it zero-shot to real hardware. The approach decouples high-level kinematic planning from a low-level compliant controller and uses domain randomization to bridge the sim-to-real gap, aided by training-time privileged fruit masks. Results show up to 86.7% real-world success across stemmed plants and demonstrate generalization to sequential multi-fruit exposure, with ablations illustrating the importance of simulation fidelity and training aids. The study suggests that structurally faithful abstract plant models, coupled with robust perception and compliant actuation, can enable scalable autonomous harvesting in cluttered, deformable environments.

Abstract

Autonomous harvesting in the open presents a complex manipulation problem. In most scenarios, an autonomous system has to deal with significant occlusion and require interaction in the presence of large structural uncertainties (every plant is different). Perceptual and modeling uncertainty make design of reliable manipulation controllers for harvesting challenging, resulting in poor performance during deployment. We present a sim2real reinforcement learning (RL) framework for occlusion-aware plant manipulation, where a policy is learned entirely in simulation to reposition stems and leaves to reveal target fruit(s). In our proposed approach, we decouple high-level kinematic planning from low-level compliant control which simplifies the sim2real transfer. This decomposition allows the learned policy to generalize across multiple plants with different stiffness and morphology. In experiments with multiple real-world plant setups, our system achieves up to 86.7% success in exposing target fruits, demonstrating robustness to occlusion variation and structural uncertainty.

Find the Fruit: Zero-Shot Sim2Real RL for Occlusion-Aware Plant Manipulation

TL;DR

This work tackles occlusion-heavy fruit harvesting with deformable plants by learning a policy in a high-fidelity FEM-based simulator and transferring it zero-shot to real hardware. The approach decouples high-level kinematic planning from a low-level compliant controller and uses domain randomization to bridge the sim-to-real gap, aided by training-time privileged fruit masks. Results show up to 86.7% real-world success across stemmed plants and demonstrate generalization to sequential multi-fruit exposure, with ablations illustrating the importance of simulation fidelity and training aids. The study suggests that structurally faithful abstract plant models, coupled with robust perception and compliant actuation, can enable scalable autonomous harvesting in cluttered, deformable environments.

Abstract

Autonomous harvesting in the open presents a complex manipulation problem. In most scenarios, an autonomous system has to deal with significant occlusion and require interaction in the presence of large structural uncertainties (every plant is different). Perceptual and modeling uncertainty make design of reliable manipulation controllers for harvesting challenging, resulting in poor performance during deployment. We present a sim2real reinforcement learning (RL) framework for occlusion-aware plant manipulation, where a policy is learned entirely in simulation to reposition stems and leaves to reveal target fruit(s). In our proposed approach, we decouple high-level kinematic planning from low-level compliant control which simplifies the sim2real transfer. This decomposition allows the learned policy to generalize across multiple plants with different stiffness and morphology. In experiments with multiple real-world plant setups, our system achieves up to 86.7% success in exposing target fruits, demonstrating robustness to occlusion variation and structural uncertainty.

Paper Structure

This paper contains 19 sections, 1 equation, 6 figures, 1 table.

Figures (6)

  • Figure 1: Top: We present a sim2real approach for occlusion-aware plant manipulation to reveal fruits in cluttered environments. We perform end-to-end RL training on a generic deformable plant in simulation (left), and then deploy the trained policy in a zero-shot manner on a real stemmed plant (right) using the MyBuddy 280 robot (center). Bottom: Frame-by-frame sequences of occlusion removal in simulation (left) and real-world trials (right), showing that the policy trained on an abstract model generalizes to experimental plants.
  • Figure 2: Overview of the framework. (a) Sim: The robot interacts with an abstract plant in Isaac Lab, receiving RGB, depth, and fruit mask inputs (mask used only in training). The RL agent exchanges actions and rewards with the simulator to learn occlusion-aware manipulation. (b) Real: The trained policy is deployed on the MyBuddy 280 robot. An RGB-D camera provides observations, and a low-level controller executes high-level commands for safe interaction with real plants.
  • Figure 3: Plant used for real-world experiments. (a) and (d) show plants with a clearly visible central stem and broad foliage, similar to the simulated morphology. (b) and (c) show reinforced versions of (a): (b) is the doubled reinforced version of (a), and (c) is the tripple reinforced version of (a). The zoom-in highlights the reinforcement detail. (e) depicts a dense, bushy plant lacking a distinct central stem, presenting a more severe occlusion challenge. Together, these variations enable testing of the policy's robustness across a range of plant geometry and mechanical properties. All plants are artificial.
  • Figure 4: (a) Experimental setup showing predefined fruit locations relative to the robot and the plant, viewed from the rear of the robot (x-z plane). (b) Top-down view (x-y plane) of the five different initial configurations, presented in end-effector positions. (c) Heatmap of successful trials across all plants (I-V) and fruit locations (A-I), where success count represents the number of successful attempts out of five trials for each initial configuration. (d) Heatmap of successful trials across all plant (I-V) and initial configurations ($\alpha$-$\varepsilon$), where success count represents the number of successful attempts out of nine trials for each fruit location The percentage numbers at the lower left are the success rates across each plant.
  • Figure 5: (a) The modified version of Plant IV: mechanically more compliant while preserving its overall structure and visual complexity (b) Heatmap of successful trials between Plant IV and V and fruit locations (A-F), where success count represents the number of successful attempts out of five trials for each initial configuration. (c) Training performance with and without access to the ground-truth fruit mask. The mask accelerates learning and improves final returns (d) Evaluation results when deploying the policy trained with the mask: performance remains consistent even when the mask is removed at test time
  • ...and 1 more figures