Lifelong Robot Learning with Human Assisted Language Planners

Meenal Parakh; Alisha Fong; Anthony Simeonov; Tao Chen; Abhishek Gupta; Pulkit Agrawal

Lifelong Robot Learning with Human Assisted Language Planners

Meenal Parakh, Alisha Fong, Anthony Simeonov, Tao Chen, Abhishek Gupta, Pulkit Agrawal

TL;DR

This work tackles the fixed-skill limitation of LLM-based robotic planners by enabling lifelong learning through interactive skill acquisition. It introduces a modular system with perception producing spatially-grounded scene descriptions, a GPT-4–driven planner, and a Python code API for skills, augmented by a learn_skill interface. New skills are grounded quickly via Neural Descriptor Fields from few demonstrations, allowing rapid expansion and reuse in future tasks, thereby enabling open-world manipulation and continual learning. Real-world and simulated experiments demonstrate that the planner can request, acquire, and reuse new skills to satisfy complex tasks, while an LLM-only evaluation reveals the model’s capacity and limitations in skill growth and reuse.

Abstract

Large Language Models (LLMs) have been shown to act like planners that can decompose high-level instructions into a sequence of executable instructions. However, current LLM-based planners are only able to operate with a fixed set of skills. We overcome this critical limitation and present a method for using LLM-based planners to query new skills and teach robots these skills in a data and time-efficient manner for rigid object manipulation. Our system can re-use newly acquired skills for future tasks, demonstrating the potential of open world and lifelong learning. We evaluate the proposed framework on multiple tasks in simulation and the real world. Videos are available at: https://sites.google.com/mit.edu/halp-robot-learning.

Lifelong Robot Learning with Human Assisted Language Planners

TL;DR

Abstract

Paper Structure (41 sections, 11 figures, 2 tables)

This paper contains 41 sections, 11 figures, 2 tables.

Introduction
Related Work
LLMs as Zero-Shot Planners
End-to-End Language Conditioned Manipulation
Low-Level Robot Primitives.
Method
Perception
Spatially-grounded Textual Scene Description
Planning and Control
Skill Definitions via Code API
Full Planner Input/Output and Skill Execution
Learning New Skills and Expanding the Skill Library
Requesting New Abilities with learn_skill function
Data- and time-efficient skill grounding with NDFs
Learning from Feedback
...and 26 more sections

Figures (11)

Figure 1: Our system consists of three modules: perception, planning, and control. The perception module processes RGB-D images and outputs a textual scene description that identifies objects and their spatial relationships. The planning module uses GPT-4 to plan a sequence of steps based on the available skills and the task command. We added a learn_skill(skill_name) function to the planner so that it can plan to learn a new skill if such learning is necessary for completing the task. Finally, the control module executes the planned steps using the available skills or starts learning a new skill.
Figure 2: (a) From RGBD images, our perception module obtains information about the objects and their relations, creates an object information dictionary, and generates a scene description (detection, object pairs corresponding to given object relations, and the template is in black). (b) An example showing the interaction between the robot, the user, and the planner.
Figure 3: High-level plan and images for three tasks requiring a new skill: (A) Grasp mug by the handle, (B) Place bottle in container on its side, and (C) Empty the sink. The gray comments represent execution feedback while the green text is human feedback. When learn_skill is not available, the robot fails to complete the tasks. However, by learning new skills, the planner expands its abilities and satisfies each task requirement.
Figure A1: Example plan for "box over a mug" task, with and without spatially-grounded scene description information in the input. If the scene description lacks spatial information, the planner fails to communicate that the box must be removed before picking up the mug.
Figure A2: Examples for find primitive in the API. Numbers in green pass the threshold and the object is found. Number in red are below threshold and no object is found. The green highlighted are true positives, while the red highlighted is an example of a false positive.
...and 6 more figures

Lifelong Robot Learning with Human Assisted Language Planners

TL;DR

Abstract

Lifelong Robot Learning with Human Assisted Language Planners

Authors

TL;DR

Abstract

Table of Contents

Figures (11)