Larimar: Large Language Models with Episodic Memory Control
Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen
TL;DR
Larimar introduces a brain-inspired external episodic memory to augment large language models, enabling one-shot, training-free updates to factual knowledge while preserving model performance on prior knowledge. The architecture combines an encoder, a fixed-size memory, and a decoder, with memory reads/writes conditioning generation and a least-squares memory update ensuring efficient, gradient-free updates. It supports selective forgetting and information leakage prevention, and uses a scope-detection mechanism to keep memory-consistent outputs. Empirically, Larimar achieves competitive single-fact editing results, strong retention in sequential edits, and substantial speedups (4–10×) over strong baselines, with promising generalization to longer input contexts. The work provides a simple, model-agnostic framework for real-time knowledge updating with potential impact on deployment, privacy, and bias control in LLM systems.
Abstract
Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 8-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https://github.com/IBM/larimar
