TARGO: Benchmarking Target-driven Object Grasping under Occlusions

Yan Xia; Ran Ding; Ziyuan Qin; Guanqi Zhan; Kaichen Zhou; Long Yang; Hao Dong; Daniel Cremers

TARGO: Benchmarking Target-driven Object Grasping under Occlusions

Yan Xia, Ran Ding, Ziyuan Qin, Guanqi Zhan, Kaichen Zhou, Long Yang, Hao Dong, Daniel Cremers

TL;DR

This work tackles target-driven object grasping under occlusion, a key challenge in cluttered robotics. It introduces TARGO, a benchmark with large-scale synthetic data and real-world scenes to analyze occlusion effects, plus a scalable data-generation pipeline. The authors evaluate five state-of-the-art models, reveal performance degradation as occlusion increases, and propose TARGO-Net, a transformer-based model with a 3D shape completion module that achieves robust grasping under occlusion on both synthetic and real data. The dataset and code are released to enable future research and practical deployment in occluded environments.

Abstract

Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic grasping. However, previous grasping models face challenges in cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make the following contributions: 1) We are the first to study the occlusion level of grasping. 2) We set up an evaluation benchmark consisting of large-scale synthetic data and part of real-world data, and we evaluated five grasp models and found that even the current SOTA model suffers when the occlusion level increases, leaving grasping under occlusion still a challenge. 3) We also generate a large-scale training dataset via a scalable pipeline, which can be used to boost the performance of grasping under occlusion and generalized to the real world. 4) We further propose a transformer-based grasping model involving a shape completion module, termed TARGO-Net, which performs most robustly as occlusion increases. Our benchmark dataset can be found at https://TARGO-benchmark.github.io/.

TARGO: Benchmarking Target-driven Object Grasping under Occlusions

TL;DR

Abstract

Paper Structure (33 sections, 4 equations, 17 figures, 3 tables)

This paper contains 33 sections, 4 equations, 17 figures, 3 tables.

Introduction
Related Work
Benchmarking Occlusion in Grasping
Design Parameters
Benchmark Datasets
Target-driven Grasping Models
Evaluation Metric
Preliminary Benchmark Analysis
Our TARGO-Net
Problem formulation
Model Architecture
3D Instance Segmentation
3D Shape Completion
Grasp Detection Network
Loss Functions
...and 18 more sections

Figures (17)

Figure 1: Examples of our TARGO-Synthetic training/test sets and TARGO-Real datasets with occlusion levels from $[0, 0.1)$ to $[0.8, 0.9)$. The red one is the target to grasp.
Figure 2: Target counts in our TARGO under different occlusion levels.
Figure 3: Model architecture of our TARGO-Net, consisting of instance segmentation, shape completion, and grasp detection modules. Grasp detection. We first concatenate the cluttered scene point cloud and the target point cloud, feeding them into a shared 1D CNN layer. Next, we quantize the scene and target point features separately. Using a Cross-Attention Transformer and Sparse2Dense operations, we obtain 3D feature grids, which are then projected, pooled, and processed using a 2D U-Net to obtain 2D feature grids. Finally, an affordance implicit function is applied to obtain 6-DoF grasps and the corresponding grasp quality. A sampling technique is employed to select and refine the optimal grasps.
Figure 4: Comparisons of our TARGO-Net and baselines.
Figure 5: Effects of data augmentation with single scenes.
...and 12 more figures

TARGO: Benchmarking Target-driven Object Grasping under Occlusions

TL;DR

Abstract

TARGO: Benchmarking Target-driven Object Grasping under Occlusions

Authors

TL;DR

Abstract

Table of Contents

Figures (17)