DDGC: Generative Deep Dexterous Grasping in Clutter
Jens Lundell, Francesco Verdoja, Ville Kyrki
TL;DR
DDGC presents a generative deep network for fast, collision-free dexterous grasping in clutter by predicting the 6-DOF grasp pose $\mathbf{p}$ and hand configuration $\mathbf{q}$ from a single RGB-D image. The method combines scene completion, image encoding, a coarse-to-fine grasp generator, a differentiable finger refinement layer, and a Wasserstein discriminator with dedicated losses to produce multiple high-quality grasps in under a second. Trained entirely on synthetic clutter data, DDGC outperforms Multi-FinGAN and the GraspIt! simulated-annealing planner in both simulation and real hardware, achieving higher grasp quality, greater clearance rates, and faster sampling by about $4$–$5\times$. The work demonstrates strong sim-to-real transfer without fine-tuning and provides a scalable path to practical multi-finger grasping in cluttered environments, thanks to its scene-aware encoding and coarse-to-fine refinement pipeline.
Abstract
Recent advances in multi-fingered robotic grasping have enabled fast 6-Degrees-Of-Freedom (DOF) single object grasping. Multi-finger grasping in cluttered scenes, on the other hand, remains mostly unexplored due to the added difficulty of reasoning over obstacles which greatly increases the computational time to generate high-quality collision-free grasps. In this work we address such limitations by introducing DDGC, a fast generative multi-finger grasp sampling method that can generate high quality grasps in cluttered scenes from a single RGB-D image. DDGC is built as a network that encodes scene information to produce coarse-to-fine collision-free grasp poses and configurations. We experimentally benchmark DDGC against the simulated-annealing planner in GraspIt! on 1200 simulated cluttered scenes and 7 real world scenes. The results show that DDGC outperforms the baseline on synthesizing high-quality grasps and removing clutter while being 5 times faster. This, in turn, opens up the door for using multi-finger grasps in practical applications which has so far been limited due to the excessive computation time needed by other methods.
