Table of Contents
Fetching ...

CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization

Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark

TL;DR

CLIN tackles the problem of continual improvement for language-based agents operating in varied, task-rich environments without parameter updates. It introduces a memory-driven architecture that stores and updates causal abstractions, plus a meta-memory mechanism to generalize across tasks and environments. Empirical results on ScienceWorld show CLIN outperforming Reflexion by 23 points on adaptation and achieving notable zero-shot and continual improvements in new environments and tasks, driven by memory-based generalization. This work demonstrates a viable nonparametric learning paradigm for frozen LLMs, enabling rapid, sustained improvement through structured memory and memory-driven planning.

Abstract

Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. However, despite their zero-shot capabilities, these agents to date do not continually improve over time beyond performance refinement on a specific task. Here we present CLIN, the first language-based agent to achieve this, so that it continually improves over multiple trials, including when both the environment and task are varied, and without requiring parameter updates. Our approach is to use a persistent, dynamic, textual memory centered on causal abstractions (rather than general "helpful hints") that is regularly updated after each trial so that the agent gradually learns useful knowledge for new trials. In the ScienceWorld benchmark, CLIN is able to continually improve on repeated trials on the same task and environment, outperforming state-of-the-art reflective language agents like Reflexion by 23 absolute points. CLIN can also transfer its learning to new environments (or new tasks), improving its zero-shot performance by 4 points (13 for new tasks) and can further improve performance there through continual memory updates, enhancing performance by an additional 17 points (7 for new tasks). This suggests a new architecture for agents built on frozen models that can still continually and rapidly improve over time.

CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization

TL;DR

CLIN tackles the problem of continual improvement for language-based agents operating in varied, task-rich environments without parameter updates. It introduces a memory-driven architecture that stores and updates causal abstractions, plus a meta-memory mechanism to generalize across tasks and environments. Empirical results on ScienceWorld show CLIN outperforming Reflexion by 23 points on adaptation and achieving notable zero-shot and continual improvements in new environments and tasks, driven by memory-based generalization. This work demonstrates a viable nonparametric learning paradigm for frozen LLMs, enabling rapid, sustained improvement through structured memory and memory-driven planning.

Abstract

Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. However, despite their zero-shot capabilities, these agents to date do not continually improve over time beyond performance refinement on a specific task. Here we present CLIN, the first language-based agent to achieve this, so that it continually improves over multiple trials, including when both the environment and task are varied, and without requiring parameter updates. Our approach is to use a persistent, dynamic, textual memory centered on causal abstractions (rather than general "helpful hints") that is regularly updated after each trial so that the agent gradually learns useful knowledge for new trials. In the ScienceWorld benchmark, CLIN is able to continually improve on repeated trials on the same task and environment, outperforming state-of-the-art reflective language agents like Reflexion by 23 absolute points. CLIN can also transfer its learning to new environments (or new tasks), improving its zero-shot performance by 4 points (13 for new tasks) and can further improve performance there through continual memory updates, enhancing performance by an additional 17 points (7 for new tasks). This suggests a new architecture for agents built on frozen models that can still continually and rapidly improve over time.
Paper Structure (15 sections, 12 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 12 figures, 2 tables, 1 algorithm.

Figures (12)

  • Figure 1: CLIN creates (Trial1) or adapts (Trial2+) a memory of causal abstractions to help in future trials by reflecting on the last trial and current memory. It does this using a suitably prompted LLM to generate the updated memory (Section \ref{['sec:memory']}). Here, reflecting on Trial1, CLIN notes in memory that going to the kitchen helped with finding seeds, enabling it to find the seeds faster in Trial2. From there, it also learns that moving the seeds to the pot helped plant the seeds. To further generalize across episodes (sequences of trials, right figure) for use in new environments, CLIN generates a summary (“meta-memory”) of the best (starred) memories from each prior episode, here generating the generalization that moving to different rooms helps finding objects (Section \ref{['sec:metamemory']})
  • Figure 2: The architecture of CLIN. A controller takes the current task, retrievals from memory, and the trial so far, to generate the next goal to achieve. The executor then converts this to a valid action to perform towards that goal. The simulator then performs the action and returns an observation of that action's effect. Memory is updated at the end of each trial by the memory generator (Section \ref{['sec:memory']}).
  • Figure 3: Continual Learning with CLIN
  • Figure 4: Rapid task adaptation with CLIN.(a) Example tasks where CLIN improves scores across trials. For CLIN, Trial-0 is the Base, Trial-4 is the Adapt. (b) Comparison of CLIN with Reflexion reflexion. (c)CLIN improves from Base to Adapt (full results in \ref{['sec:more_results']}).
  • Figure 5: Reward and #steps trends for CLIN in (a)Gen-Env and (b)Gen-Task. (c) % episode improvements and score change than CLIN without meta-memory (Gen-Task). (d)CLIN ablations.
  • ...and 7 more figures