Table of Contents
Fetching ...

Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections

Lihan Zha, Yuchen Cui, Li-Heng Lin, Minae Kwon, Montserrat Gonzalez Arenas, Andy Zeng, Fei Xia, Dorsa Sadigh

TL;DR

The paper addresses poor generalization of robot manipulation policies by enabling an LLM-based framework (DROC) that can handle arbitrary online language corrections, distill generalizable knowledge from correction histories, and retrieve relevant past experiences via textual and visual similarity. DROC jointly uses a correction handler, knowledge extractor, and knowledge retriever to generate corrected plans or code, distill task- and skill-level knowledge into a knowledge base, and retrieve relevant experiences for new tasks. Experiments on a Franka Panda show DROC outperforms Code-as-Policies baselines, reduces initial corrections by about half, and maintains improvements with successive iterations, with visual retrieval proving crucial for robust generalization. The approach advances long-horizon robotic manipulation by combining LLM reasoning with multimodal retrieval and knowledge management to support adaptation in unseen environments.

Abstract

Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Human corrective feedback is a crucial form of guidance to enable such generalization. However, adapting to and learning from online human corrections is a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can be arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), a large language model (LLM)-based system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via LLMs by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations. We show further results, videos, prompts and code on https://sites.google.com/stanford.edu/droc .

Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections

TL;DR

The paper addresses poor generalization of robot manipulation policies by enabling an LLM-based framework (DROC) that can handle arbitrary online language corrections, distill generalizable knowledge from correction histories, and retrieve relevant past experiences via textual and visual similarity. DROC jointly uses a correction handler, knowledge extractor, and knowledge retriever to generate corrected plans or code, distill task- and skill-level knowledge into a knowledge base, and retrieve relevant experiences for new tasks. Experiments on a Franka Panda show DROC outperforms Code-as-Policies baselines, reduces initial corrections by about half, and maintains improvements with successive iterations, with visual retrieval proving crucial for robust generalization. The approach advances long-horizon robotic manipulation by combining LLM reasoning with multimodal retrieval and knowledge management to support adaptation in unseen environments.

Abstract

Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Human corrective feedback is a crucial form of guidance to enable such generalization. However, adapting to and learning from online human corrections is a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can be arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), a large language model (LLM)-based system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via LLMs by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations. We show further results, videos, prompts and code on https://sites.google.com/stanford.edu/droc .
Paper Structure (5 sections, 4 figures, 2 tables)

This paper contains 5 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of DROC with example task "put the scissors in the top drawer": the human interrupted the robot when it attempts to pick up the scissors before opening the drawer, the correction handler regenerated a plan accordingly and the knowledge extractor extracts a high-level constraint; during executing the skill of opening top drawer, the human interrupted again to correct the grasping pose of the robot by providing two low-level commands.
  • Figure 2: Example of Visual Retrieval. To retrieve the relevant task for opening the bottom gray drawer, textual similarity of the task instructions alone cannot filter the correct experience to reuse and similarity between visual features of the object (drawer handles specifically) are important for retrieving the correction past experience.
  • Figure 3: Skill-level results. For all tasks, the results are averaged over six rounds of experiments. The error bars reflect the standard errors across different rounds. Each iteration corresponds to a different task setting. The number of corrections declines as the iteration increases, which shows that DROC can generalize and adapt to unseen new settings. For the "Hang Cup on Rack" task, we are not showing decline of corrections over iterations but instead ablate the correction and distillation module of our system.
  • Figure 4: Illustrative examples for plan-level test cases. (1) upon interruption, DROC responds to correction by identifying it is a plan-level error and replanning, and distills the constraint for future tasks; (2) given a task with ambiguity, DROC retrieves past experiences base on semantic and visual similarities.