Table of Contents
Fetching ...

GenCHiP: Generating Robot Policy Code for High-Precision and Contact-Rich Manipulation Tasks

Kaylee Burns, Ajinkya Jain, Keegan Go, Fei Xia, Michael Stark, Stefan Schaal, Karol Hausman

TL;DR

GenCHiP investigates whether large language models can generate robot policy code for high-precision, contact-rich manipulation by reparameterizing the action space to enforce impedance/admittance constraints. By exposing a compliant control space and using targeted prompting strategies, GenCHiP enables LLMs to produce executable policies that reason about forces and contacts, even under perceptual noise. Across the Functional Manipulation Benchmark and NIST Task Board tasks, GenCHiP outperforms contact-unaware baselines by more than 3x–4x, demonstrating strong generalization to varied object geometries and task conditions. The work highlights the practical potential of LLMs to automate low-level parameterization of dexterous manipulation policies, reducing manual tuning and enabling broader applicability in real-world robotics.

Abstract

Large Language Models (LLMs) have been successful at generating robot policy code, but so far these results have been limited to high-level tasks that do not require precise movement. It is an open question how well such approaches work for tasks that require reasoning over contact forces and working within tight success tolerances. We find that, with the right action space, LLMs are capable of successfully generating policies for a variety of contact-rich and high-precision manipulation tasks, even under noisy conditions, such as perceptual errors or grasping inaccuracies. Specifically, we reparameterize the action space to include compliance with constraints on the interaction forces and stiffnesses involved in reaching a target pose. We validate this approach on subtasks derived from the Functional Manipulation Benchmark (FMB) and NIST Task Board Benchmarks. Exposing this action space alongside methods for estimating object poses improves policy generation with an LLM by greater than 3x and 4x when compared to non-compliant action spaces

GenCHiP: Generating Robot Policy Code for High-Precision and Contact-Rich Manipulation Tasks

TL;DR

GenCHiP investigates whether large language models can generate robot policy code for high-precision, contact-rich manipulation by reparameterizing the action space to enforce impedance/admittance constraints. By exposing a compliant control space and using targeted prompting strategies, GenCHiP enables LLMs to produce executable policies that reason about forces and contacts, even under perceptual noise. Across the Functional Manipulation Benchmark and NIST Task Board tasks, GenCHiP outperforms contact-unaware baselines by more than 3x–4x, demonstrating strong generalization to varied object geometries and task conditions. The work highlights the practical potential of LLMs to automate low-level parameterization of dexterous manipulation policies, reducing manual tuning and enabling broader applicability in real-world robotics.

Abstract

Large Language Models (LLMs) have been successful at generating robot policy code, but so far these results have been limited to high-level tasks that do not require precise movement. It is an open question how well such approaches work for tasks that require reasoning over contact forces and working within tight success tolerances. We find that, with the right action space, LLMs are capable of successfully generating policies for a variety of contact-rich and high-precision manipulation tasks, even under noisy conditions, such as perceptual errors or grasping inaccuracies. Specifically, we reparameterize the action space to include compliance with constraints on the interaction forces and stiffnesses involved in reaching a target pose. We validate this approach on subtasks derived from the Functional Manipulation Benchmark (FMB) and NIST Task Board Benchmarks. Exposing this action space alongside methods for estimating object poses improves policy generation with an LLM by greater than 3x and 4x when compared to non-compliant action spaces
Paper Structure (28 sections, 2 equations, 10 figures, 3 tables)

This paper contains 28 sections, 2 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: (a) We prompt an LLM to generate code for high-precision tasks. By using an action space that paramterizes compliant behavior, the LLM is able to generate action sequences for contact-rich tasks like peg insertion. (b) Language models' ability to reason about object geometry and make plans by using world knowledge about different object enables zero-shot generalization to new tasks.
  • Figure 2: We generate code by formatting natural language requests and instructions as comments. Generations are highlighted in blue.
  • Figure 3: We present information about the task and control API via prompting. The API description is the same across all environments, the hints and examples are the same within each environment, while the task description must be modified to describe each task.
  • Figure 4: Generating code with point-to-point moves limits policies to free-space-motions. For this policy to run successfully, displacement along the z-axis in the second and fifth actions must be exact.
  • Figure 5: Compliance prevents the robot from faulting when in contact. Conditional termination constraints enable the language model to reason about contact forces. In this example, the robot moves a cable back and forth until no more upward force is detected, which indicates that an opening has been found.
  • ...and 5 more figures