Table of Contents
Fetching ...

Task-Oriented 6-DoF Grasp Pose Detection in Clutters

An-Lan Wang, Nuo Chen, Kun-Yu Lin, Li Yuan-Ming, Wei-Shi Zheng

TL;DR

This work tackles Task-Oriented 6-DoF Grasp Pose Detection in Clutters (TO6DGC) by introducing the 6-DoF Task Grasp (6DTG) dataset and a one-stage baseline, OSTG, that directly predicts task-oriented 6-DoF grasps. The method combines a task-oriented point selection mechanism with a task-guided grasp pose generator, informed by a novel 6-DoF grasp representation and a Gram-Schmidt-based orthonormalization to ensure feasible rotations. On 6DTG, OSTG outperforms two-stage and one-stage baselines across coverage and success metrics, and real-robot experiments corroborate improved perception of task-oriented grasp points and 6-DoF poses. The dataset and approach offer a practical path toward assistive robots that perform goal-directed manipulation in cluttered environments.

Abstract

In general, humans would grasp an object differently for different tasks, e.g., "grasping the handle of a knife to cut" vs. "grasping the blade to hand over". In the field of robotic grasp pose detection research, some existing works consider this task-oriented grasping and made some progress, but they are generally constrained by low-DoF gripper type or non-cluttered setting, which is not applicable for human assistance in real life. With an aim to get more general and practical grasp models, in this paper, we investigate the problem named Task-Oriented 6-DoF Grasp Pose Detection in Clutters (TO6DGC), which extends the task-oriented problem to a more general 6-DOF Grasp Pose Detection in Cluttered (multi-object) scenario. To this end, we construct a large-scale 6-DoF task-oriented grasping dataset, 6-DoF Task Grasp (6DTG), which features 4391 cluttered scenes with over 2 million 6-DoF grasp poses. Each grasp is annotated with a specific task, involving 6 tasks and 198 objects in total. Moreover, we propose One-Stage TaskGrasp (OSTG), a strong baseline to address the TO6DGC problem. Our OSTG adopts a task-oriented point selection strategy to detect where to grasp, and a task-oriented grasp generation module to decide how to grasp given a specific task. To evaluate the effectiveness of OSTG, extensive experiments are conducted on 6DTG. The results show that our method outperforms various baselines on multiple metrics. Real robot experiments also verify that our OSTG has a better perception of the task-oriented grasp points and 6-DoF grasp poses.

Task-Oriented 6-DoF Grasp Pose Detection in Clutters

TL;DR

This work tackles Task-Oriented 6-DoF Grasp Pose Detection in Clutters (TO6DGC) by introducing the 6-DoF Task Grasp (6DTG) dataset and a one-stage baseline, OSTG, that directly predicts task-oriented 6-DoF grasps. The method combines a task-oriented point selection mechanism with a task-guided grasp pose generator, informed by a novel 6-DoF grasp representation and a Gram-Schmidt-based orthonormalization to ensure feasible rotations. On 6DTG, OSTG outperforms two-stage and one-stage baselines across coverage and success metrics, and real-robot experiments corroborate improved perception of task-oriented grasp points and 6-DoF poses. The dataset and approach offer a practical path toward assistive robots that perform goal-directed manipulation in cluttered environments.

Abstract

In general, humans would grasp an object differently for different tasks, e.g., "grasping the handle of a knife to cut" vs. "grasping the blade to hand over". In the field of robotic grasp pose detection research, some existing works consider this task-oriented grasping and made some progress, but they are generally constrained by low-DoF gripper type or non-cluttered setting, which is not applicable for human assistance in real life. With an aim to get more general and practical grasp models, in this paper, we investigate the problem named Task-Oriented 6-DoF Grasp Pose Detection in Clutters (TO6DGC), which extends the task-oriented problem to a more general 6-DOF Grasp Pose Detection in Cluttered (multi-object) scenario. To this end, we construct a large-scale 6-DoF task-oriented grasping dataset, 6-DoF Task Grasp (6DTG), which features 4391 cluttered scenes with over 2 million 6-DoF grasp poses. Each grasp is annotated with a specific task, involving 6 tasks and 198 objects in total. Moreover, we propose One-Stage TaskGrasp (OSTG), a strong baseline to address the TO6DGC problem. Our OSTG adopts a task-oriented point selection strategy to detect where to grasp, and a task-oriented grasp generation module to decide how to grasp given a specific task. To evaluate the effectiveness of OSTG, extensive experiments are conducted on 6DTG. The results show that our method outperforms various baselines on multiple metrics. Real robot experiments also verify that our OSTG has a better perception of the task-oriented grasp points and 6-DoF grasp poses.

Paper Structure

This paper contains 23 sections, 7 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Examples of several existing datasets and our 6-DoF Task Grasp (6DTG) dataset. Compared with the existing task-oriented grasping datasets (a) and (b), 6DTG uses 6-DoF grasp poses and cluttered scenes. Compared with the existing 6-DoF Grasping in clutters dataset (c), 6DTG provides the task annotation for each grasp. Different grasp pose color represents different task in our 6DTG dataset. Best viewed in color.
  • Figure 2: Previous methods chen2022TOGtang2023graspgpt focus on single object scenarios, and use a two-stage pipeline, i.e., generate task-irrelevant (stable) grasps firstly and then use an evaluation model to evaluate whether the grasp is suitable for a particular task. In contrast, we propose a novel one-stage task-guided grasp pose detection model to detect task-oriented grasp in a holistic way.
  • Figure 3: Visualizations of (a) Object-level and (b) scene-level task-oriented grasp annotation. Different colored grippers represent different tasks. Best viewed in color.
  • Figure 4: Overview of our proposed One-Stage TaskGrasp (OSTG) model. In the figure, we provide an example of a model detecting grasp poses to grasp the scissors in the clutters that can finish the cut task. First of all, the point encoder-decoder processes $N\times3$ points and outputs $N' \times C$-dim point features. Supervised by the point-wise labels, the Task-oriented Point Selection module selects $M$ points according to objectness, target objectness, and taskness, in a step-by-step manner. These task-related grasp point features are then fed into the Task-Guided Grasp Pose Detection module to generate task-oriented grasps. We additionally add a stable grasp loss to supervise the model. Best viewed in color.
  • Figure 5: We represent a grasp rotation using an approaching vector and baseline vector. To calculate the grasp loss, we further represent a grasp pose using five points as shown above. Best Viewed in color.
  • ...and 1 more figures