Table of Contents
Fetching ...

Multi-FinGAN: Generative Coarse-To-Fine Sampling of Multi-Finger Grasps

Jens Lundell, Enric Corona, Tran Nguyen Le, Francesco Verdoja, Philippe Weinzaepfel, Gregory Rogez, Francesc Moreno-Noguer, Ville Kyrki

TL;DR

Multi-FinGAN presents a fast, end-to-end coarse-to-fine framework for multi-finger grasping that directly generates $6$D grasps from RGB-D images. It combines a multi-label grasp-type classifier, a refinement network, and a differentiable finger-refinement layer within a GAN-inspired objective (Wasserstein discriminator plus complementary losses) to produce collision-free, realistic grasps without an external planner. Trained solely on synthetic data, it achieves substantially faster grasp generation (about one second) and improved grasp quality and success rates in both simulation and real-robot experiments, with 20–30× speedups over a baseline. The approach demonstrates robust sim-to-real transfer while highlighting remaining challenges in ranking-based selection and scale sensitivity, pointing to future work in direct joint-angle regression and critic-based evaluation to further accelerate grasp synthesis.

Abstract

While there exists many methods for manipulating rigid objects with parallel-jaw grippers, grasping with multi-finger robotic hands remains a quite unexplored research topic. Reasoning and planning collision-free trajectories on the additional degrees of freedom of several fingers represents an important challenge that, so far, involves computationally costly and slow processes. In this work, we present Multi-FinGAN, a fast generative multi-finger grasp sampling method that synthesizes high quality grasps directly from RGB-D images in about a second. We achieve this by training in an end-to-end fashion a coarse-to-fine model composed of a classification network that distinguishes grasp types according to a specific taxonomy and a refinement network that produces refined grasp poses and joint angles. We experimentally validate and benchmark our method against a standard grasp-sampling method on 790 grasps in simulation and 20 grasps on a real Franka Emika Panda. All experimental results using our method show consistent improvements both in terms of grasp quality metrics and grasp success rate. Remarkably, our approach is up to 20-30 times faster than the baseline, a significant improvement that opens the door to feedback-based grasp re-planning and task informative grasping. Code is available at https://irobotics.aalto.fi/multi-fingan/.

Multi-FinGAN: Generative Coarse-To-Fine Sampling of Multi-Finger Grasps

TL;DR

Multi-FinGAN presents a fast, end-to-end coarse-to-fine framework for multi-finger grasping that directly generates D grasps from RGB-D images. It combines a multi-label grasp-type classifier, a refinement network, and a differentiable finger-refinement layer within a GAN-inspired objective (Wasserstein discriminator plus complementary losses) to produce collision-free, realistic grasps without an external planner. Trained solely on synthetic data, it achieves substantially faster grasp generation (about one second) and improved grasp quality and success rates in both simulation and real-robot experiments, with 20–30× speedups over a baseline. The approach demonstrates robust sim-to-real transfer while highlighting remaining challenges in ranking-based selection and scale sensitivity, pointing to future work in direct joint-angle regression and critic-based evaluation to further accelerate grasp synthesis.

Abstract

While there exists many methods for manipulating rigid objects with parallel-jaw grippers, grasping with multi-finger robotic hands remains a quite unexplored research topic. Reasoning and planning collision-free trajectories on the additional degrees of freedom of several fingers represents an important challenge that, so far, involves computationally costly and slow processes. In this work, we present Multi-FinGAN, a fast generative multi-finger grasp sampling method that synthesizes high quality grasps directly from RGB-D images in about a second. We achieve this by training in an end-to-end fashion a coarse-to-fine model composed of a classification network that distinguishes grasp types according to a specific taxonomy and a refinement network that produces refined grasp poses and joint angles. We experimentally validate and benchmark our method against a standard grasp-sampling method on 790 grasps in simulation and 20 grasps on a real Franka Emika Panda. All experimental results using our method show consistent improvements both in terms of grasp quality metrics and grasp success rate. Remarkably, our approach is up to 20-30 times faster than the baseline, a significant improvement that opens the door to feedback-based grasp re-planning and task informative grasping. Code is available at https://irobotics.aalto.fi/multi-fingan/.

Paper Structure

This paper contains 16 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: From an input RBG-D image, Multi-FinGAN generates a diverse set of grasps from all around the object in about a second, and then executes the highest scoring grasp on the real robot.
  • Figure 2: The architecture of Multi-FinGAN.
  • Figure 3: Example of three synthetic RGB images used for training (a), and a grasp generated by our method (b).
  • Figure 4: Histograms showing all results obtained on both datasets by our approach and the baseline in terms of $\epsilon$-quality and interpenetration (best viewed in color).
  • Figure 5: The objects used in the physical experiments.
  • ...and 1 more figures