Table of Contents
Fetching ...

From Local Corrections to Generalized Skills: Improving Neuro-Symbolic Policies with MEMO

Benjamin A. Christie, Yinlong Dai, Mohammad Bararjanianbahnamiri, Simon Stepputtis, Dylan P. Losey

TL;DR

Memory Enhanced Manipulation is developed, which builds and maintains a retrieval-augmented skillbook gathered from human feedback and task successes, enabling the robot's policy to generate new skills while reasoning over multi-task human feedback.

Abstract

Recent works use a neuro-symbolic framework for general manipulation policies. The advantage of this framework is that -- by applying off-the-shelf vision and language models -- the robot can break complex tasks down into semantic subtasks. However, the fundamental bottleneck is that the robot needs skills to ground these subtasks into embodied motions. Skills can take many forms (e.g., trajectory snippets, motion primitives, coded functions), but regardless of their form skills act as a constraint. The high-level policy can only ground its language reasoning through the available skills; if the robot cannot generate the right skill for the current task, its policy will fail. We propose to address this limitation -- and dynamically expand the robot's skills -- by leveraging user feedback. When a robot fails, humans can intuitively explain what went wrong (e.g., ``no, go higher''). While a simple approach is to recall this exact text the next time the robot faces a similar situation, we hypothesize that by collecting, clustering, and re-phrasing natural language corrections across multiple users and tasks, we can synthesize more general text guidance and coded skill templates. Applying this hypothesis we develop Memory Enhanced Manipulation (MEMO). MEMO builds and maintains a retrieval-augmented skillbook gathered from human feedback and task successes. At run time, MEMO retrieves relevant text and code from this skillbook, enabling the robot's policy to generate new skills while reasoning over multi-task human feedback. Our experiments demonstrate that using MEMO to aggregate local feedback into general skill templates enables generalization to novel tasks where existing baselines fall short. See supplemental material here: https://collab.me.vt.edu/memo

From Local Corrections to Generalized Skills: Improving Neuro-Symbolic Policies with MEMO

TL;DR

Memory Enhanced Manipulation is developed, which builds and maintains a retrieval-augmented skillbook gathered from human feedback and task successes, enabling the robot's policy to generate new skills while reasoning over multi-task human feedback.

Abstract

Recent works use a neuro-symbolic framework for general manipulation policies. The advantage of this framework is that -- by applying off-the-shelf vision and language models -- the robot can break complex tasks down into semantic subtasks. However, the fundamental bottleneck is that the robot needs skills to ground these subtasks into embodied motions. Skills can take many forms (e.g., trajectory snippets, motion primitives, coded functions), but regardless of their form skills act as a constraint. The high-level policy can only ground its language reasoning through the available skills; if the robot cannot generate the right skill for the current task, its policy will fail. We propose to address this limitation -- and dynamically expand the robot's skills -- by leveraging user feedback. When a robot fails, humans can intuitively explain what went wrong (e.g., ``no, go higher''). While a simple approach is to recall this exact text the next time the robot faces a similar situation, we hypothesize that by collecting, clustering, and re-phrasing natural language corrections across multiple users and tasks, we can synthesize more general text guidance and coded skill templates. Applying this hypothesis we develop Memory Enhanced Manipulation (MEMO). MEMO builds and maintains a retrieval-augmented skillbook gathered from human feedback and task successes. At run time, MEMO retrieves relevant text and code from this skillbook, enabling the robot's policy to generate new skills while reasoning over multi-task human feedback. Our experiments demonstrate that using MEMO to aggregate local feedback into general skill templates enables generalization to novel tasks where existing baselines fall short. See supplemental material here: https://collab.me.vt.edu/memo
Paper Structure (16 sections, 3 equations, 5 figures, 2 tables)

This paper contains 16 sections, 3 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Neuro-symbolic policy tries to "toast food." (Top Left) The robot fails because it lacks the necessary skill for opening the toaster. (Top Right) A human gives feedback. MEMO collects this and other feedback into a retrieval-augmented skillbook, and then clusters the skillbook to extract generalized text and code templates. (Bottom) The generalized entries guide the policy's code generation for new skills, e.g., an open_door skill.
  • Figure 2: MEMO builds generalizable skills by retrieving, collecting, and clustering human feedback. MEMO first decomposes a task into its atomic actions (i.e., subtasks). Before selecting an action, it queries the skillbook for any subtask-relevant feedback or functions that it can reference. If the user intervenes, MEMO stores a paraphrased form of their feedback in the skillbook for future reference. Otherwise, we store the action as a skill template. Offline MEMO clusters and compresses the entries in the skillbook to mitigate information loss and gather more general skills for future action generation.
  • Figure 3: Our setup for real-world experiments. Here the robot uses MEMO to "Empty the Cabinet" in a zero-shot manner.
  • Figure 4: Zero-shot success rate for our held-out evaluation tasks in simulation. The $x$-axis shows the size of the skillbook in terms of user hours. Note that as the skillbook grows, the performance of MEMO$-$C and DROC$-$V tends to stagnate without clustering local corrections into generalized guidance. The average zero-shot success rate at the onset of the study (with no entries in the skillbook) is approximately 30$\%$, while as the skillbook grows in size, MEMO approaches a success rate of 80$\%$.
  • Figure 5: Breaking down Figure \ref{['fig:sims1']} with the complete $12$ hours of user feedback. Across the simulated evaluation tasks, MEMO achieves a higher zero-shot success rate than alternatives. Without the inclusion of offline clustering, MEMO is unable to generate the necessary skills for "Empty the Cabinet", "Close the Bottle", and "Pour the Can." In unseen tasks MEMO achieves a success rate of $78\%$ as compared to $40\%$ for DROC$-$V.