Table of Contents
Fetching ...

RoboGrasp: A Universal Grasping Policy for Robust Robotic Control

Yiqi Huang, Travis Davies, Jiahuan Yan, Xiang Chen, Yu Tian, Luhui Hu

TL;DR

RoboGrasp tackles the challenge of robust robotic grasping generalization by marrying pretrained grasp detection with a diffusion-based policy. The approach conditions diffusion actions on grasping-box cues and enhanced observation encodings, enabling precise, goal-directed manipulation across diverse objects and layouts. Empirical results across PickBig, PickCup, and PickGoods show substantial improvements in task and grasp success rates, with strong few-shot generalization and affordance-prompt capabilities, while data scale and environmental complexity influence performance. This framework offers a scalable, versatile path toward reliable manipulation in unstructured settings, with potential extensions to language guidance and broader foundation-model integrations.

Abstract

Imitation learning and world models have shown significant promise in advancing generalizable robotic learning, with robotic grasping remaining a critical challenge for achieving precise manipulation. Existing methods often rely heavily on robot arm state data and RGB images, leading to overfitting to specific object shapes or positions. To address these limitations, we propose RoboGrasp, a universal grasping policy framework that integrates pretrained grasp detection models with robotic learning. By leveraging robust visual guidance from object detection and segmentation tasks, RoboGrasp significantly enhances grasp precision, stability, and generalizability, achieving up to 34% higher success rates in few-shot learning and grasping box prompt tasks. Built on diffusion-based methods, RoboGrasp is adaptable to various robotic learning paradigms, enabling precise and reliable manipulation across diverse and complex scenarios. This framework represents a scalable and versatile solution for tackling real-world challenges in robotic grasping.

RoboGrasp: A Universal Grasping Policy for Robust Robotic Control

TL;DR

RoboGrasp tackles the challenge of robust robotic grasping generalization by marrying pretrained grasp detection with a diffusion-based policy. The approach conditions diffusion actions on grasping-box cues and enhanced observation encodings, enabling precise, goal-directed manipulation across diverse objects and layouts. Empirical results across PickBig, PickCup, and PickGoods show substantial improvements in task and grasp success rates, with strong few-shot generalization and affordance-prompt capabilities, while data scale and environmental complexity influence performance. This framework offers a scalable, versatile path toward reliable manipulation in unstructured settings, with potential extensions to language guidance and broader foundation-model integrations.

Abstract

Imitation learning and world models have shown significant promise in advancing generalizable robotic learning, with robotic grasping remaining a critical challenge for achieving precise manipulation. Existing methods often rely heavily on robot arm state data and RGB images, leading to overfitting to specific object shapes or positions. To address these limitations, we propose RoboGrasp, a universal grasping policy framework that integrates pretrained grasp detection models with robotic learning. By leveraging robust visual guidance from object detection and segmentation tasks, RoboGrasp significantly enhances grasp precision, stability, and generalizability, achieving up to 34% higher success rates in few-shot learning and grasping box prompt tasks. Built on diffusion-based methods, RoboGrasp is adaptable to various robotic learning paradigms, enabling precise and reliable manipulation across diverse and complex scenarios. This framework represents a scalable and versatile solution for tackling real-world challenges in robotic grasping.

Paper Structure

This paper contains 20 sections, 1 equation, 9 figures, 1 table.

Figures (9)

  • Figure 1: An overview RoboGrasp architecture, demonstrating the integration of grasping guidance, RGB images and robot state data to enhance generalizability and precision of grasping manipulation. (a) Data flow and datasets used for training and inference. (b) Hardware setup, including an industrial-grade robotic arm, RealSense cameras, and a Quest VR headset for data collection. (c) Annotation of demonstrations for grasping affordances. (d) Experimental task designs. (e) The RoboGrasp policy architecture.
  • Figure 2: The anatomy of a grasping box. A region on an item indicating the region that can be grasped, along with the $x$, $y$ coordinates of the box's centroid and the box's width and height.
  • Figure 3: Grasping boxes for cups used in the experiments, shown in a bird's-eye view. (a) illustrates a grasp by the wall of the cup, while (b) and (c) demonstrate grasps by the cup handles. (d) and (e) depict grasps over the cup's diameter. (c) and (e) represent the cups used in the few-shot experiments.
  • Figure 4: Placement Positions Generalizability experiment setup for PickBig. (a) and (b) show two of the eight placement positions. The objective of PickBig is to distinguish between two similarly shaped blocks and successfully grasp the larger one along its diameter.
  • Figure 5: Few-shot experiment setup for PickCup task. The green mug in (a) represents the handle grasping few-shot task with only 5 demonstrations. The blue plastic cup in (b) represents the diameter grasping few-shot task with 10 demonstrations.
  • ...and 4 more figures