Table of Contents
Fetching ...

Harnessing the Synergy between Pushing, Grasping, and Throwing to Enhance Object Manipulation in Cluttered Scenarios

Hamidreza Kasaei, Mohammadreza Kasaei

TL;DR

The paper investigates how pushing, grasping, and throwing can be synergistically deployed to manipulate cluttered environments. It introduces a modular, model-free RL framework trained in Gazebo that learns push, grasp, and throw policies separately, with perception providing a grasp-quality map and object masks to condition actions. The approach demonstrates strong sim-to-real transfer, achieving over 80% success across tasks and outperforming a DDPG baseline in real-world trials, especially as task complexity grows. This work advances practical cluttered-object manipulation by uniting non-prehensile and prehensile actions in a scalable, trainable pipeline, with potential extensions to common-sense reasoning via LLMs for complex household tasks.

Abstract

In this work, we delve into the intricate synergy among non-prehensile actions like pushing, and prehensile actions such as grasping and throwing, within the domain of robotic manipulation. We introduce an innovative approach to learning these synergies by leveraging model-free deep reinforcement learning. The robot's workflow involves detecting the pose of the target object and the basket at each time step, predicting the optimal push configuration to isolate the target object, determining the appropriate grasp configuration, and inferring the necessary parameters for an accurate throw into the basket. This empowers robots to skillfully reconfigure cluttered scenarios through pushing, creating space for collision-free grasping actions. Simultaneously, we integrate throwing behavior, showcasing how this action significantly extends the robot's operational reach. Ensuring safety, we developed a simulation environment in Gazebo for robot training, applying the learned policy directly to our real robot. Notably, this work represents a pioneering effort to learn the synergy between pushing, grasping, and throwing actions. Extensive experimentation in both simulated and real-robot scenarios substantiates the effectiveness of our approach across diverse settings. Our approach achieves a success rate exceeding 80\% in both simulated and real-world scenarios. A video showcasing our experiments is available online at: https://youtu.be/q1l4BJVDbRw

Harnessing the Synergy between Pushing, Grasping, and Throwing to Enhance Object Manipulation in Cluttered Scenarios

TL;DR

The paper investigates how pushing, grasping, and throwing can be synergistically deployed to manipulate cluttered environments. It introduces a modular, model-free RL framework trained in Gazebo that learns push, grasp, and throw policies separately, with perception providing a grasp-quality map and object masks to condition actions. The approach demonstrates strong sim-to-real transfer, achieving over 80% success across tasks and outperforming a DDPG baseline in real-world trials, especially as task complexity grows. This work advances practical cluttered-object manipulation by uniting non-prehensile and prehensile actions in a scalable, trainable pipeline, with potential extensions to common-sense reasoning via LLMs for complex household tasks.

Abstract

In this work, we delve into the intricate synergy among non-prehensile actions like pushing, and prehensile actions such as grasping and throwing, within the domain of robotic manipulation. We introduce an innovative approach to learning these synergies by leveraging model-free deep reinforcement learning. The robot's workflow involves detecting the pose of the target object and the basket at each time step, predicting the optimal push configuration to isolate the target object, determining the appropriate grasp configuration, and inferring the necessary parameters for an accurate throw into the basket. This empowers robots to skillfully reconfigure cluttered scenarios through pushing, creating space for collision-free grasping actions. Simultaneously, we integrate throwing behavior, showcasing how this action significantly extends the robot's operational reach. Ensuring safety, we developed a simulation environment in Gazebo for robot training, applying the learned policy directly to our real robot. Notably, this work represents a pioneering effort to learn the synergy between pushing, grasping, and throwing actions. Extensive experimentation in both simulated and real-robot scenarios substantiates the effectiveness of our approach across diverse settings. Our approach achieves a success rate exceeding 80\% in both simulated and real-world scenarios. A video showcasing our experiments is available online at: https://youtu.be/q1l4BJVDbRw
Paper Structure (17 sections, 1 equation, 7 figures, 3 tables)

This paper contains 17 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview: The perception system provides essential inputs, including top-down RGB-D views of the workspace, and a mask highlighting the target object. These inputs are then processed through an object-agnostic grasp policy, resulting in pixel-wise grasp synthesis for the scene. Based on the grasp quality of the target object, the system makes a decision between executing a push action or a grasp action. Specifically, if the grasp quality surpasses a predefined threshold, the robot initiates a grasp; otherwise, it proceeds with a push action. After successfully grasping the target object, the robot leverages throwing actions when the target basket is out of its immediate reach.
  • Figure 2: Kinesetic teaching of throwing kernel.
  • Figure 3: Visualizing the output of our perception system: world model information is provided through a top-down view, a grasp quality map, and a mask of the target object. The robot's workspace is outlined by the green rectangle, and the predicted push action is shown by the green arrow.
  • Figure 4: Our experimental setups: (from left to right) Training the throwing policy, the push-and-grasp policy, integrating all policies into a unified robotic system, and the real dual-arm robot setup.
  • Figure 5: We created 10 cubic scenarios to assess the efficacy of the acquired policies in real robot experiments.
  • ...and 2 more figures