Table of Contents
Fetching ...

DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies

William Xie, Maria Valentini, Jensen Lavering, Nikolaus Correll

TL;DR

It is demonstrated that LLM-parameterized but first-principles grasp policies outperform both traditional adaptive grasp policies and direct LLM-as-code policies on a custom benchmark of 12 delicate and deformable items including food, produce, toys, and other everyday items.

Abstract

Large language models (LLMs) can provide rich physical descriptions of most worldly objects, allowing robots to achieve more informed and capable grasping. We leverage LLMs' common sense physical reasoning and code-writing abilities to infer an object's physical characteristics$\unicode{x2013}$mass $m$, friction coefficient $μ$, and spring constant $k$$\unicode{x2013}$from a semantic description, and then translate those characteristics into an executable adaptive grasp policy. Using a two-finger gripper with a built-in depth camera that can control its torque by limiting motor current, we demonstrate that LLM-parameterized but first-principles grasp policies outperform both traditional adaptive grasp policies and direct LLM-as-code policies on a custom benchmark of 12 delicate and deformable items including food, produce, toys, and other everyday items, spanning two orders of magnitude in mass and required pick-up force. We then improve property estimation and grasp performance on variable size objects with model finetuning on property-based comparisons and eliciting such comparisons via chain-of-thought prompting. We also demonstrate how compliance feedback from DeliGrasp policies can aid in downstream tasks such as measuring produce ripeness. Our code and videos are available at: https://deligrasp.github.io

DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies

TL;DR

It is demonstrated that LLM-parameterized but first-principles grasp policies outperform both traditional adaptive grasp policies and direct LLM-as-code policies on a custom benchmark of 12 delicate and deformable items including food, produce, toys, and other everyday items.

Abstract

Large language models (LLMs) can provide rich physical descriptions of most worldly objects, allowing robots to achieve more informed and capable grasping. We leverage LLMs' common sense physical reasoning and code-writing abilities to infer an object's physical characteristicsmass , friction coefficient , and spring constant from a semantic description, and then translate those characteristics into an executable adaptive grasp policy. Using a two-finger gripper with a built-in depth camera that can control its torque by limiting motor current, we demonstrate that LLM-parameterized but first-principles grasp policies outperform both traditional adaptive grasp policies and direct LLM-as-code policies on a custom benchmark of 12 delicate and deformable items including food, produce, toys, and other everyday items, spanning two orders of magnitude in mass and required pick-up force. We then improve property estimation and grasp performance on variable size objects with model finetuning on property-based comparisons and eliciting such comparisons via chain-of-thought prompting. We also demonstrate how compliance feedback from DeliGrasp policies can aid in downstream tasks such as measuring produce ripeness. Our code and videos are available at: https://deligrasp.github.io
Paper Structure (17 sections, 1 equation, 5 figures, 7 tables, 1 algorithm)

This paper contains 17 sections, 1 equation, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: Large language models (LLMs) have rich physical knowledge about worldly objects, but cannot directly reason robot grasps for them. Paired with open-world localization and pose estimation (left), our method (middle), queries LLMs for the salient physical characteristics of mass, friction, and compliance as the basis for an adaptive grasp controller. DeliGrasp policies successfully grasp delicate and deformable objects (right). These policies also produce compliance feedback as measured spring constants, which we leverage for downstream tasks like picking ripe produce (middle). Fine-tuning on this feedback expands LLM knowledge to bespoke objects.
  • Figure 2: A. Our experimental setup with a tabletop UR5 robot arm equipped with the MAGPIE Gripper magpieB. Free body diagram describing gripper interactions with an object at rest, adapted from adaptive_graspC. The delicate objects dataset ranging from 2-900g and various material properties.
  • Figure 3: (A) We compare mass estimates (row) across different LLMs and prompting strategies (columns), including the base DeliGrasp "Thinker" prompt with GPT-4 (DG 4) and GPT-3.5-Turbo (DG 3.5), GPT-3.5-Turbo finetuned on the PhysObjects dataset (DG FT 3.5), GPT-4 and GPT-3.5-Turbo without finetuning but with chain-of-thought (CoT) physical reasoning prompting, (DG CoT 4 abd DG CoT 3.5), and GPT-3.5-Turbo with finetuning and CoT prompting (DG FT CoT 3.5). We observe that both finetuning and CoT prompting improve mass estimates, and that the methods together yield the most improved estimates. We also show how semantic modifiers such as an "empty paper cup" (B) and "paper cup filled with water" (C) result in drastically different estimates on weight (25x) and other meta parameters.
  • Figure 4: DeliGrasp adjusts the grasp force (A) for the verb of "checking" the avocado, from the estimated 3.92 N to 0.5 N. Each grasp measures a spring constant k (B) without damaging the avocados. Such measurements can be used for downstream LLM-reasoning tasks (C) like picking ripe produce or meal planning.
  • Figure : The delicate and deformable objects used for evaluation span from 0.8 to 900g and from soft produce to rigid plastic, and they are commonly grasped in real-world environments like homes, grocery stores, and kitchens. We measure object width, mass, and approximate minimum grasping force, $F_{min}$. "Object Description" inputs are paired with a grasp verb, "pick," to DeliGrasp prompts to generate property estimates and grasp policies. We also qualify what kind of damage or "invalidating deformation" renders a grasp a failure.