Table of Contents
Fetching ...

Larimar: Large Language Models with Episodic Memory Control

Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen

TL;DR

Larimar introduces a brain-inspired external episodic memory to augment large language models, enabling one-shot, training-free updates to factual knowledge while preserving model performance on prior knowledge. The architecture combines an encoder, a fixed-size memory, and a decoder, with memory reads/writes conditioning generation and a least-squares memory update ensuring efficient, gradient-free updates. It supports selective forgetting and information leakage prevention, and uses a scope-detection mechanism to keep memory-consistent outputs. Empirically, Larimar achieves competitive single-fact editing results, strong retention in sequential edits, and substantial speedups (4–10×) over strong baselines, with promising generalization to longer input contexts. The work provides a simple, model-agnostic framework for real-time knowledge updating with potential impact on deployment, privacy, and bias control in LLM systems.

Abstract

Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 8-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https://github.com/IBM/larimar

Larimar: Large Language Models with Episodic Memory Control

TL;DR

Larimar introduces a brain-inspired external episodic memory to augment large language models, enabling one-shot, training-free updates to factual knowledge while preserving model performance on prior knowledge. The architecture combines an encoder, a fixed-size memory, and a decoder, with memory reads/writes conditioning generation and a least-squares memory update ensuring efficient, gradient-free updates. It supports selective forgetting and information leakage prevention, and uses a scope-detection mechanism to keep memory-consistent outputs. Empirically, Larimar achieves competitive single-fact editing results, strong retention in sequential edits, and substantial speedups (4–10×) over strong baselines, with promising generalization to longer input contexts. The work provides a simple, model-agnostic framework for real-time knowledge updating with potential impact on deployment, privacy, and bias control in LLM systems.

Abstract

Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 8-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https://github.com/IBM/larimar
Paper Structure (37 sections, 7 equations, 6 figures, 14 tables, 1 algorithm)

This paper contains 37 sections, 7 equations, 6 figures, 14 tables, 1 algorithm.

Figures (6)

  • Figure 1: Larimar Architecture: $X$ and $X_{query}$ respectively denote data input and query, $Z$, $Z_{query}$ and $Z_r$ are the latent vectors, and $M$ is the fixed-size memory. $W$ and $W_0$ are reading/writing weights to memory. $W_M$ interfaces the readout from memory to the decoder.
  • Figure 2: (a) Batch editing accuracy on Counterfact dataset. Baseline performances are taken from meng2023massediting. Green: MEMIT, Orange: ROME, Magenta: MEND, Black: Larimar-6B. (b) Mean F1 score on a held-out set of unseen rephrasings from ZsRE over a sequence of 3000 edits, showing Larimar's generalizes better over GRACE on two datasets with $1000$ or $511$ independent facts ($10$ and $\approx20$ rephrasings per fact, respectively).
  • Figure 3: Batch editing on CounterFact dataset. Baseline performances are taken from meng2023massediting. Green: MEMIT, Orange: ROME, Magenta: MEND, Black: Larimar-6B.
  • Figure 4: Batch editing on CounterFact dataset with different memory slot size $K$.
  • Figure 5: Mean F1 score of Larimar, comparing different choices for computing reading and writing weights -- the Gaussian convolution in Eq. \ref{['eq:gaussian_conv']} and the pseudoinverse method of pham2021generative -- on held-out sets of unseen rephrasings from ZsRE over a sequence of 3000 edits. (Black curves are shown in Figure \ref{['fig:batch']} (b) in the main text.)
  • ...and 1 more figures