AGILE: Approach-based Grasp Inference Learned from Element Decomposition
MohammadHossein Koosheshi, Hamed Hosseini, Mehdi Tale Masouleh, Ahmad Kalhor, Mohammad Reza Hairi Yazdi
TL;DR
AGILE tackles robotic grasping by leveraging hand-object approach information and explicit object element decomposition. It proposes a two-stage pipeline with a Mask R-CNN-based element decomposer and an approach-conditioned grasp detector that regresses the grasp rectangle $(x,y,\theta,w)$, trained on a novel Coppeliasim dataset with 10 objects and element masks. In simulation, the method achieves 90% success on seen objects and 78% on unseen objects, and sim-to-real adaptation yields about 70% physical grasp success on a Delta parallel robot with a 2-finger gripper, demonstrating notable generalization and a path toward real-world deployment. The work contributes a public dataset and a practical pipeline for approach-aware, element-based grasp inference, while identifying improvements such as multi-view sensing and larger object sets to close the sim-to-real gap.
Abstract
Humans, this species expert in grasp detection, can grasp objects by taking into account hand-object positioning information. This work proposes a method to enable a robot manipulator to learn the same, grasping objects in the most optimal way according to how the gripper has approached the object. Built on deep learning, the proposed method consists of two main stages. In order to generalize the network on unseen objects, the proposed Approach-based Grasping Inference involves an element decomposition stage to split an object into its main parts, each with one or more annotated grasps for a particular approach of the gripper. Subsequently, a grasp detection network utilizes the decomposed elements by Mask R-CNN and the information on the approach of the gripper in order to detect the element the gripper has approached and the most optimal grasp. In order to train the networks, the study introduces a robotic grasping dataset collected in the Coppeliasim simulation environment. The dataset involves 10 different objects with annotated element decomposition masks and grasp rectangles. The proposed method acquires a 90% grasp success rate on seen objects and 78% on unseen objects in the Coppeliasim simulation environment. Lastly, simulation-to-reality domain adaptation is performed by applying transformations on the training set collected in simulation and augmenting the dataset, which results in a 70% physical grasp success performance using a Delta parallel robot and a 2 -fingered gripper.
