RoboGrasp: A Universal Grasping Policy for Robust Robotic Control
Yiqi Huang, Travis Davies, Jiahuan Yan, Xiang Chen, Yu Tian, Luhui Hu
TL;DR
RoboGrasp tackles the challenge of robust robotic grasping generalization by marrying pretrained grasp detection with a diffusion-based policy. The approach conditions diffusion actions on grasping-box cues and enhanced observation encodings, enabling precise, goal-directed manipulation across diverse objects and layouts. Empirical results across PickBig, PickCup, and PickGoods show substantial improvements in task and grasp success rates, with strong few-shot generalization and affordance-prompt capabilities, while data scale and environmental complexity influence performance. This framework offers a scalable, versatile path toward reliable manipulation in unstructured settings, with potential extensions to language guidance and broader foundation-model integrations.
Abstract
Imitation learning and world models have shown significant promise in advancing generalizable robotic learning, with robotic grasping remaining a critical challenge for achieving precise manipulation. Existing methods often rely heavily on robot arm state data and RGB images, leading to overfitting to specific object shapes or positions. To address these limitations, we propose RoboGrasp, a universal grasping policy framework that integrates pretrained grasp detection models with robotic learning. By leveraging robust visual guidance from object detection and segmentation tasks, RoboGrasp significantly enhances grasp precision, stability, and generalizability, achieving up to 34% higher success rates in few-shot learning and grasping box prompt tasks. Built on diffusion-based methods, RoboGrasp is adaptable to various robotic learning paradigms, enabling precise and reliable manipulation across diverse and complex scenarios. This framework represents a scalable and versatile solution for tackling real-world challenges in robotic grasping.
