Table of Contents
Fetching ...

Diffusion for Multi-Embodiment Grasping

Roman Freiberg, Alexander Qualmann, Ngo Anh Vien, Gerhard Neumann

TL;DR

An approach based on equivariant diffusion is presented that facilitates gripper-agnostic encoding of scenes containing graspable objects and gripper-aware decoding of grasp poses by integrating gripper geometry into the model, outperforming both single-gripper and multi-gripper State-of-the-Art methods.

Abstract

Grasping is a fundamental skill in robotics with diverse applications across medical, industrial, and domestic domains. However, current approaches for predicting valid grasps are often tailored to specific grippers, limiting their applicability when gripper designs change. To address this limitation, we explore the transfer of grasping strategies between various gripper designs, enabling the use of data from diverse sources. In this work, we present an approach based on equivariant diffusion that facilitates gripper-agnostic encoding of scenes containing graspable objects and gripper-aware decoding of grasp poses by integrating gripper geometry into the model. We also develop a dataset generation framework that produces cluttered scenes with variable-sized object heaps, improving the training of grasp synthesis methods. Experimental evaluation on diverse object datasets demonstrates the generalizability of our approach across gripper architectures, ranging from simple parallel-jaw grippers to humanoid hands, outperforming both single-gripper and multi-gripper state-of-the-art methods.

Diffusion for Multi-Embodiment Grasping

TL;DR

An approach based on equivariant diffusion is presented that facilitates gripper-agnostic encoding of scenes containing graspable objects and gripper-aware decoding of grasp poses by integrating gripper geometry into the model, outperforming both single-gripper and multi-gripper State-of-the-Art methods.

Abstract

Grasping is a fundamental skill in robotics with diverse applications across medical, industrial, and domestic domains. However, current approaches for predicting valid grasps are often tailored to specific grippers, limiting their applicability when gripper designs change. To address this limitation, we explore the transfer of grasping strategies between various gripper designs, enabling the use of data from diverse sources. In this work, we present an approach based on equivariant diffusion that facilitates gripper-agnostic encoding of scenes containing graspable objects and gripper-aware decoding of grasp poses by integrating gripper geometry into the model. We also develop a dataset generation framework that produces cluttered scenes with variable-sized object heaps, improving the training of grasp synthesis methods. Experimental evaluation on diverse object datasets demonstrates the generalizability of our approach across gripper architectures, ranging from simple parallel-jaw grippers to humanoid hands, outperforming both single-gripper and multi-gripper state-of-the-art methods.

Paper Structure

This paper contains 26 sections, 8 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of architecture. (a) Point cloud scans of scene with objects and gripper in open and closed configurations are equivariantly encoded to a point cloud feature space. (b) Diffusion process computes pre-grasp pose for given gripper and scene using feature query for current pose estimate.
  • Figure 2: Overview of used grippers. (a) Robotiq 2F-85, (b) Franka Emika Gripper, (c) Google Bot Gripper, (d) Rethink Gripper, (e) ViperX 300s Gripper, (f) Allegro (g) Shadow DEX-EE Hand (h) Shadow Hand
  • Figure 3: Count of Grippers Able to Grasp an Object. Grasp generation uses an antipodal sampling strategy for all gripper types, while preserving some gripper-specific properties.
  • Figure 4: Collision models for grasp scenes. Rendering of collision models in a grasping scene for various gripper types. Each subfigure illustrates an example of a generated grasp pose specific to the gripper type depicted.
  • Figure 5: Real-world setup. (a) The Kassow KR 1205 robot with 7 axes, equipped with a Schunk WSG-32 gripper and a Schunk tool changer for gripper exchange. A top-down RealSense camera is used to capture the scene's point cloud. (b) Visualizations of point clouds including generated grasps for the WSG-32 and Robotiq 2F-85 grippers. Note that the grippers and objects are not part of the training dataset.