Table of Contents
Fetching ...

Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models

Georgios Tziafas, Hamidreza Kasaei

TL;DR

This paper addresses the challenge of lifelong manipulation with language grounding by introducing LRLL, a memory-augmented, gradient-free agent that continuously grows a library of composable robot skills. It combines wake-sleep cycles, a soft experience memory, and a skill abstraction module to distill past interactions into new Python-based primitives, enabling scalable, interpretable policies without fine-tuning. In simulation, LRLL outperforms end-to-end and static-LLM baselines and demonstrates transferring learned skills to real-world dual-arm manipulation, highlighting improvements in generalization, memory efficiency, and avoidance of catastrophic forgetting. The approach paves the way for scalable, human-in-the-loop, language-grounded robotic systems that can autonomously expand their capabilities without gradient-based optimization, while pointing to future work in multimodal perception and faster, cheaper LLMs for practical deployment.

Abstract

Large Language Models (LLMs) have emerged as a new paradigm for embodied reasoning and control, most recently by generating robot policy code that utilizes a custom library of vision and control primitive skills. However, prior arts fix their skills library and steer the LLM with carefully hand-crafted prompt engineering, limiting the agent to a stationary range of addressable tasks. In this work, we introduce LRLL, an LLM-based lifelong learning agent that continuously grows the robot skill library to tackle manipulation tasks of ever-growing complexity. LRLL achieves this with four novel contributions: 1) a soft memory module that allows dynamic storage and retrieval of past experiences to serve as context, 2) a self-guided exploration policy that proposes new tasks in simulation, 3) a skill abstractor that distills recent experiences into new library skills, and 4) a lifelong learning algorithm for enabling human users to bootstrap new skills with minimal online interaction. LRLL continuously transfers knowledge from the memory to the library, building composable, general and interpretable policies, while bypassing gradient-based optimization, thus relieving the learner from catastrophic forgetting. Empirical evaluation in a simulated tabletop environment shows that LRLL outperforms end-to-end and vanilla LLM approaches in the lifelong setup while learning skills that are transferable to the real world. Project material will become available at the webpage https://gtziafas.github.io/LRLL_project.

Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models

TL;DR

This paper addresses the challenge of lifelong manipulation with language grounding by introducing LRLL, a memory-augmented, gradient-free agent that continuously grows a library of composable robot skills. It combines wake-sleep cycles, a soft experience memory, and a skill abstraction module to distill past interactions into new Python-based primitives, enabling scalable, interpretable policies without fine-tuning. In simulation, LRLL outperforms end-to-end and static-LLM baselines and demonstrates transferring learned skills to real-world dual-arm manipulation, highlighting improvements in generalization, memory efficiency, and avoidance of catastrophic forgetting. The approach paves the way for scalable, human-in-the-loop, language-grounded robotic systems that can autonomously expand their capabilities without gradient-based optimization, while pointing to future work in multimodal perception and faster, cheaper LLMs for practical deployment.

Abstract

Large Language Models (LLMs) have emerged as a new paradigm for embodied reasoning and control, most recently by generating robot policy code that utilizes a custom library of vision and control primitive skills. However, prior arts fix their skills library and steer the LLM with carefully hand-crafted prompt engineering, limiting the agent to a stationary range of addressable tasks. In this work, we introduce LRLL, an LLM-based lifelong learning agent that continuously grows the robot skill library to tackle manipulation tasks of ever-growing complexity. LRLL achieves this with four novel contributions: 1) a soft memory module that allows dynamic storage and retrieval of past experiences to serve as context, 2) a self-guided exploration policy that proposes new tasks in simulation, 3) a skill abstractor that distills recent experiences into new library skills, and 4) a lifelong learning algorithm for enabling human users to bootstrap new skills with minimal online interaction. LRLL continuously transfers knowledge from the memory to the library, building composable, general and interpretable policies, while bypassing gradient-based optimization, thus relieving the learner from catastrophic forgetting. Empirical evaluation in a simulated tabletop environment shows that LRLL outperforms end-to-end and vanilla LLM approaches in the lifelong setup while learning skills that are transferable to the real world. Project material will become available at the webpage https://gtziafas.github.io/LRLL_project.

Paper Structure

This paper contains 13 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Wake-sleep library learning from human guidance.
  • Figure 2: Overview of an LRLL learning cycle. At the beginning of the wake phase, a human user provides demonstrations and hints, out of which an LLM-based exploration module proposes tasks to complete, while an LLM-based actor-critic agent interacts with the environment to execute and verify tasks. During sleep, the experiences are clustered according to their code's abstract syntax trees (AST) and distilled into new skills with an LLM abstractor. The new skills refactor the acquired experiences, which are replayed in the environment in order to compress the memory. We note that for brevity purposes, we omit showing the actor-critic modules during replay at the sleep phase. We illustrate examples from our first curriculum cycle (spatial coordination).
  • Figure 3: tSNE projections of train (SA), test (UA+UI) and explored task instruction embeddings, for two of our curriculum cycles: a) Visual Reasoning (right), and b) Rearrangement (left). The exploration module augments the agent's experiences with task variations that cover a broad range of skill compositions from the demos. (Best viewed in color).
  • Figure 4: (Left): Number of stored experiences per cycle before and after the sleep phase of LRLL. Sleep helps to compress the number of experiences needed to reach the same performance. (Right): Averaged success rates in all unseen instructions vs. number of retrieved experiences. Sleep phase refactors experiences, which leads to sufficient context with fewer examples.