DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies

William Xie; Maria Valentini; Jensen Lavering; Nikolaus Correll

DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies

William Xie, Maria Valentini, Jensen Lavering, Nikolaus Correll

TL;DR

It is demonstrated that LLM-parameterized but first-principles grasp policies outperform both traditional adaptive grasp policies and direct LLM-as-code policies on a custom benchmark of 12 delicate and deformable items including food, produce, toys, and other everyday items.

Abstract

Large language models (LLMs) can provide rich physical descriptions of most worldly objects, allowing robots to achieve more informed and capable grasping. We leverage LLMs' common sense physical reasoning and code-writing abilities to infer an object's physical characteristics$\unicode{x2013}$mass $m$, friction coefficient $μ$, and spring constant $k$$\unicode{x2013}$from a semantic description, and then translate those characteristics into an executable adaptive grasp policy. Using a two-finger gripper with a built-in depth camera that can control its torque by limiting motor current, we demonstrate that LLM-parameterized but first-principles grasp policies outperform both traditional adaptive grasp policies and direct LLM-as-code policies on a custom benchmark of 12 delicate and deformable items including food, produce, toys, and other everyday items, spanning two orders of magnitude in mass and required pick-up force. We then improve property estimation and grasp performance on variable size objects with model finetuning on property-based comparisons and eliciting such comparisons via chain-of-thought prompting. We also demonstrate how compliance feedback from DeliGrasp policies can aid in downstream tasks such as measuring produce ripeness. Our code and videos are available at: https://deligrasp.github.io

DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies

TL;DR

Abstract

mass

, friction coefficient

, and spring constant

from a semantic description, and then translate those characteristics into an executable adaptive grasp policy. Using a two-finger gripper with a built-in depth camera that can control its torque by limiting motor current, we demonstrate that LLM-parameterized but first-principles grasp policies outperform both traditional adaptive grasp policies and direct LLM-as-code policies on a custom benchmark of 12 delicate and deformable items including food, produce, toys, and other everyday items, spanning two orders of magnitude in mass and required pick-up force. We then improve property estimation and grasp performance on variable size objects with model finetuning on property-based comparisons and eliciting such comparisons via chain-of-thought prompting. We also demonstrate how compliance feedback from DeliGrasp policies can aid in downstream tasks such as measuring produce ripeness. Our code and videos are available at: https://deligrasp.github.io

Paper Structure (17 sections, 1 equation, 5 figures, 7 tables, 1 algorithm)

This paper contains 17 sections, 1 equation, 5 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Methods
Grasp Force Modeling
Delicate Grasping
Improving Property Estimation
Classical Adaptive Grasping Baselines
Experiments
Grasping Atypical Objects
Sensing with DeliGrasp to Pick Ripe Produce
Conclusion
Appendix
Full Details of DeliGrasp Performance on Delicate Objects Dataset
Ripeness Reasoning with LLMs
DeliGrasp Desciptor Prompt
...and 2 more sections

Figures (5)

Figure 1: Large language models (LLMs) have rich physical knowledge about worldly objects, but cannot directly reason robot grasps for them. Paired with open-world localization and pose estimation (left), our method (middle), queries LLMs for the salient physical characteristics of mass, friction, and compliance as the basis for an adaptive grasp controller. DeliGrasp policies successfully grasp delicate and deformable objects (right). These policies also produce compliance feedback as measured spring constants, which we leverage for downstream tasks like picking ripe produce (middle). Fine-tuning on this feedback expands LLM knowledge to bespoke objects.
Figure 2: A. Our experimental setup with a tabletop UR5 robot arm equipped with the MAGPIE Gripper magpieB. Free body diagram describing gripper interactions with an object at rest, adapted from adaptive_graspC. The delicate objects dataset ranging from 2-900g and various material properties.
Figure 3: (A) We compare mass estimates (row) across different LLMs and prompting strategies (columns), including the base DeliGrasp "Thinker" prompt with GPT-4 (DG 4) and GPT-3.5-Turbo (DG 3.5), GPT-3.5-Turbo finetuned on the PhysObjects dataset (DG FT 3.5), GPT-4 and GPT-3.5-Turbo without finetuning but with chain-of-thought (CoT) physical reasoning prompting, (DG CoT 4 abd DG CoT 3.5), and GPT-3.5-Turbo with finetuning and CoT prompting (DG FT CoT 3.5). We observe that both finetuning and CoT prompting improve mass estimates, and that the methods together yield the most improved estimates. We also show how semantic modifiers such as an "empty paper cup" (B) and "paper cup filled with water" (C) result in drastically different estimates on weight (25x) and other meta parameters.
Figure 4: DeliGrasp adjusts the grasp force (A) for the verb of "checking" the avocado, from the estimated 3.92 N to 0.5 N. Each grasp measures a spring constant k (B) without damaging the avocados. Such measurements can be used for downstream LLM-reasoning tasks (C) like picking ripe produce or meal planning.
Figure : The delicate and deformable objects used for evaluation span from 0.8 to 900g and from soft produce to rigid plastic, and they are commonly grasped in real-world environments like homes, grocery stores, and kitchens. We measure object width, mass, and approximate minimum grasping force, $F_{min}$. "Object Description" inputs are paired with a grasp verb, "pick," to DeliGrasp prompts to generate property estimates and grasp policies. We also qualify what kind of damage or "invalidating deformation" renders a grasp a failure.

DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies

TL;DR

Abstract

DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies

Authors

TL;DR

Abstract

Table of Contents

Figures (5)