TARGO: Benchmarking Target-driven Object Grasping under Occlusions
Yan Xia, Ran Ding, Ziyuan Qin, Guanqi Zhan, Kaichen Zhou, Long Yang, Hao Dong, Daniel Cremers
TL;DR
This work tackles target-driven object grasping under occlusion, a key challenge in cluttered robotics. It introduces TARGO, a benchmark with large-scale synthetic data and real-world scenes to analyze occlusion effects, plus a scalable data-generation pipeline. The authors evaluate five state-of-the-art models, reveal performance degradation as occlusion increases, and propose TARGO-Net, a transformer-based model with a 3D shape completion module that achieves robust grasping under occlusion on both synthetic and real data. The dataset and code are released to enable future research and practical deployment in occluded environments.
Abstract
Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic grasping. However, previous grasping models face challenges in cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make the following contributions: 1) We are the first to study the occlusion level of grasping. 2) We set up an evaluation benchmark consisting of large-scale synthetic data and part of real-world data, and we evaluated five grasp models and found that even the current SOTA model suffers when the occlusion level increases, leaving grasping under occlusion still a challenge. 3) We also generate a large-scale training dataset via a scalable pipeline, which can be used to boost the performance of grasping under occlusion and generalized to the real world. 4) We further propose a transformer-based grasping model involving a shape completion module, termed TARGO-Net, which performs most robustly as occlusion increases. Our benchmark dataset can be found at https://TARGO-benchmark.github.io/.
