Task-Aware Robotic Grasping by evaluating Quality Diversity Solutions through Foundation Models

Aurel X. Appius; Emiland Garrabe; Francois Helenon; Mahdi Khoramshahi; Mohamed Chetouani; Stephane Doncieux

Task-Aware Robotic Grasping by evaluating Quality Diversity Solutions through Foundation Models

Aurel X. Appius, Emiland Garrabe, Francois Helenon, Mahdi Khoramshahi, Mohamed Chetouani, Stephane Doncieux

TL;DR

The paper tackles task-aware robotic grasping by fusing semantic segmentation and geometric reasoning through LLMs and Quality Diversity to produce zero-shot task-conditioned grasps. It introduces an open-vocabulary 3D segmentation pipeline, uses an LLM to identify graspable and task-relevant subparts, and leverages a QD grasp archive to score and select grasps via a task-compatibility function $C(g,\mathcal{T})$. On a subset of the YCB dataset with a Franka Panda robot, it reports a weighted IoU of $73.6\%$ for task-conditioned grasp regions and $88\%$ human preference for task-aware grasps in end-to-end validation, with strong statistical significance. The approach offers a training-free, scalable route to task-aligned grasping, with potential extensions to more complex geometries and richer LLM-grounding for improved robustness.

Abstract

Task-aware robotic grasping is a challenging problem that requires the integration of semantic understanding and geometric reasoning. This paper proposes a novel framework that leverages Large Language Models (LLMs) and Quality Diversity (QD) algorithms to enable zero-shot task-conditioned grasp synthesis. The framework segments objects into meaningful subparts and labels each subpart semantically, creating structured representations that can be used to prompt an LLM. By coupling semantic and geometric representations of an object's structure, the LLM's knowledge about tasks and which parts to grasp can be applied in the physical world. The QD-generated grasp archive provides a diverse set of grasps, allowing us to select the most suitable grasp based on the task. We evaluated the proposed method on a subset of the YCB dataset with a Franka Emika robot. A consolidated ground truth for task-specific grasp regions is established through a survey. Our work achieves a weighted intersection over union (IoU) of 73.6% in predicting task-conditioned grasp regions in 65 task-object combinations. An end-to-end validation study on a smaller subset further confirms the effectiveness of our approach, with 88% of responses favoring the task-aware grasp over the control group. A binomial test shows that participants significantly prefer the task-aware grasp.

Task-Aware Robotic Grasping by evaluating Quality Diversity Solutions through Foundation Models

TL;DR

Abstract

Task-Aware Robotic Grasping by evaluating Quality Diversity Solutions through Foundation Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)