Table of Contents
Fetching ...

XGrasp: Gripper-Aware Grasp Detection with Multi-Gripper Data Generation

Yeonseo Lee, Jungwook Mun, Hyosup Shin, Guebin Hwang, Junhee Nam, Taeyeop Lee, Sungho Jo

TL;DR

XGrasp tackles the challenge of grasp detection across multiple gripper types by systematically augmenting existing datasets with multi-gripper annotations and proposing a real-time two-stage architecture. The Grasp Point Predictor (GPP) handles global scene and gripper information to select candidate locations, while the Angle-Width Predictor (AWP) refines the grasp angle and width using local features and a contrastive learning objective that enables zero-shot generalization to unseen grippers. Empirical results on the Jacquard dataset, simulation, and real-world experiments demonstrate competitive grasp success rates across diverse grippers and substantially faster inference than prior gripper-aware methods. The approach also integrates with vision foundation models, highlighting practical potential for scalable, language-enabled robotic grasping, and points to future work on extending to $6$-DOF grasping and 3D spaces.

Abstract

Most robotic grasping methods are typically designed for single gripper types, which limits their applicability in real-world scenarios requiring diverse end-effectors. We propose XGrasp, a real-time gripper-aware grasp detection framework that efficiently handles multiple gripper configurations. The proposed method addresses data scarcity by systematically augmenting existing datasets with multi-gripper annotations. XGrasp employs a hierarchical two-stage architecture. In the first stage, a Grasp Point Predictor (GPP) identifies optimal locations using global scene information and gripper specifications. In the second stage, an Angle-Width Predictor (AWP) refines the grasp angle and width using local features. Contrastive learning in the AWP module enables zero-shot generalization to unseen grippers by learning fundamental grasping characteristics. The modular framework integrates seamlessly with vision foundation models, providing pathways for future vision-language capabilities. The experimental results demonstrate competitive grasp success rates across various gripper types, while achieving substantial improvements in inference speed compared to existing gripper-aware methods. Project page: https://sites.google.com/view/xgrasp

XGrasp: Gripper-Aware Grasp Detection with Multi-Gripper Data Generation

TL;DR

XGrasp tackles the challenge of grasp detection across multiple gripper types by systematically augmenting existing datasets with multi-gripper annotations and proposing a real-time two-stage architecture. The Grasp Point Predictor (GPP) handles global scene and gripper information to select candidate locations, while the Angle-Width Predictor (AWP) refines the grasp angle and width using local features and a contrastive learning objective that enables zero-shot generalization to unseen grippers. Empirical results on the Jacquard dataset, simulation, and real-world experiments demonstrate competitive grasp success rates across diverse grippers and substantially faster inference than prior gripper-aware methods. The approach also integrates with vision foundation models, highlighting practical potential for scalable, language-enabled robotic grasping, and points to future work on extending to -DOF grasping and 3D spaces.

Abstract

Most robotic grasping methods are typically designed for single gripper types, which limits their applicability in real-world scenarios requiring diverse end-effectors. We propose XGrasp, a real-time gripper-aware grasp detection framework that efficiently handles multiple gripper configurations. The proposed method addresses data scarcity by systematically augmenting existing datasets with multi-gripper annotations. XGrasp employs a hierarchical two-stage architecture. In the first stage, a Grasp Point Predictor (GPP) identifies optimal locations using global scene information and gripper specifications. In the second stage, an Angle-Width Predictor (AWP) refines the grasp angle and width using local features. Contrastive learning in the AWP module enables zero-shot generalization to unseen grippers by learning fundamental grasping characteristics. The modular framework integrates seamlessly with vision foundation models, providing pathways for future vision-language capabilities. The experimental results demonstrate competitive grasp success rates across various gripper types, while achieving substantial improvements in inference speed compared to existing gripper-aware methods. Project page: https://sites.google.com/view/xgrasp

Paper Structure

This paper contains 20 sections, 1 equation, 11 figures, 5 tables.

Figures (11)

  • Figure 1: XGrasp framework for gripper-aware grasp detection.
  • Figure 2: Gripper Input Generation Process. (a) Gripper input generation pipeline using Isaac Sim. (b) Generated 2-channel gripper inputs for Na × Nw actions. Red: Gripper Mask, Blue: Gripper Path.
  • Figure 3: Graspability Decision Rule.
  • Figure 4: The overall pipeline for generating target gripper grasp annotations.
  • Figure 5: Overview of the XGrasp framework: a proposed two-stage Gripper-Aware Grasp Detection approach.
  • ...and 6 more figures