Table of Contents
Fetching ...

In-Memory Learning: A Declarative Learning Framework for Large Language Models

Bo Wang, Tianxiang Sun, Hang Yan, Siyin Wang, Qingyuan Cheng, Xipeng Qiu

TL;DR

This work proposes In-Memory Learning (IML), a declarative-memory-inspired framework enabling language-model agents to self-improve without human labels by maintaining and refining memory notes through three phases: induction, revision, and inference. It frames the agent's operation as a POMDP with memory context notes $\phi$ and demonstrates a gradient-like refinement process within memory, including a momentum mechanism, and a dedicated benchmark to assess self-improvement. Experiments on a four-class, 10-dimension truth-table benchmark show model-dependent gains in inference and induction across different LLMs, while highlighting challenges such as local minima and parameter sensitivity. The findings suggest that memory-centric, declarative self-improvement is feasible for LLM-based agents, offering a path toward label-free continual learning, albeit with limitations in multimodality and model capacity.

Abstract

The exploration of whether agents can align with their environment without relying on human-labeled data presents an intriguing research topic. Drawing inspiration from the alignment process observed in intelligent organisms, where declarative memory plays a pivotal role in summarizing past experiences, we propose a novel learning framework. The agents adeptly distill insights from past experiences, refining and updating existing notes to enhance their performance in the environment. This entire process transpires within the memory components and is implemented through natural language, so we character this framework as In-memory Learning. We also delve into the key features of benchmarks designed to evaluate the self-improvement process. Through systematic experiments, we demonstrate the effectiveness of our framework and provide insights into this problem.

In-Memory Learning: A Declarative Learning Framework for Large Language Models

TL;DR

This work proposes In-Memory Learning (IML), a declarative-memory-inspired framework enabling language-model agents to self-improve without human labels by maintaining and refining memory notes through three phases: induction, revision, and inference. It frames the agent's operation as a POMDP with memory context notes and demonstrates a gradient-like refinement process within memory, including a momentum mechanism, and a dedicated benchmark to assess self-improvement. Experiments on a four-class, 10-dimension truth-table benchmark show model-dependent gains in inference and induction across different LLMs, while highlighting challenges such as local minima and parameter sensitivity. The findings suggest that memory-centric, declarative self-improvement is feasible for LLM-based agents, offering a path toward label-free continual learning, albeit with limitations in multimodality and model capacity.

Abstract

The exploration of whether agents can align with their environment without relying on human-labeled data presents an intriguing research topic. Drawing inspiration from the alignment process observed in intelligent organisms, where declarative memory plays a pivotal role in summarizing past experiences, we propose a novel learning framework. The agents adeptly distill insights from past experiences, refining and updating existing notes to enhance their performance in the environment. This entire process transpires within the memory components and is implemented through natural language, so we character this framework as In-memory Learning. We also delve into the key features of benchmarks designed to evaluate the self-improvement process. Through systematic experiments, we demonstrate the effectiveness of our framework and provide insights into this problem.
Paper Structure (28 sections, 1 equation, 7 figures, 1 table)

This paper contains 28 sections, 1 equation, 7 figures, 1 table.

Figures (7)

  • Figure 1: Learning Pattern. Non-declarative learning, as illustrated by the left figure, involves skills such as distinguishing relative pitches in music through practice. It's a challenge to express verbally. In contrast, declarative learning, exemplified by the right figure, refers to the acquisition of knowledge that can be explicitly stated, such as the introduction of the law of universal gravitation. For neural networks, models can develop the capability to answer questions through a gradient-based approach, as well as complete specific tasks using carefully designed prompts. This process closely resembles the learning process shown in the left parts.
  • Figure 2: Backward Process. There is a similar structure between the gradient-based learning process and In-memory Learning(Ours)
  • Figure 3: The construction process of our benchmark. We pre-define a correspondence from the truth table to the labels ($y$) and wrap it with natural language. Each column of the truth table represents a dimension of creatures ($x_i$), corresponding to two lists of adjectives. For instance, the first column stands for the size of the creature, associating the value 0 with huge and 1 with tiny. A combination of words is randomly selected from the sets of adjectives and then interconnected with predefined prompts to formulate the final questions.
  • Figure 4: Accuracy curve over learning step. The solid lines represent the smoothed curves. Both llama2-70b-chat and GPT-3.5-turbo show an upward trend. Llama2-13b-chat also shows continuous improvement, but its performance is limited by its inference capabilities. Llama2-7b-chat initially improved but experienced a decline in later steps.
  • Figure 5: Momentum example. In the No Momentum setting, agents have the freedom to create new notes without any constraints. In the Partially Momentum setting, Agents are required to start with the initial words of the previous notes, which limits their freedom to make changes. The Full Momentum setting requires agents to make changes if necessary while appending the previous notes at the end of the prompts. The red underlined part in the reply represents the modified content compared to the previous notes.
  • ...and 2 more figures