$\text{Memory}^3$: Language Modeling with Explicit Memory
Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E
TL;DR
Memory^3 addresses the high cost of training and running large language models by externalizing knowledge into explicit memory, forming a memory hierarchy where expensive writes trade off with cheaper reads. The approach introduces a formal memory circuitry theory and a sparse, externally stored memory bank, coupled with a two-stage pretraining regime to bias learning toward abstract knowledge while externalizing specific facts. A 2.4B Memory^3 model demonstrates superior performance to larger models and faster decoding than retrieval-augmented generation, aided by dense/ sparse memory mechanisms, FAISS-based retrieval, and a vector-quantized memory store. Across general benchmarks and professional tasks, Memory^3 shows competitive or superior results, improved factuality, and notable speedups, suggesting practical pathways to cheaper, scalable LLMs with infinite-context capabilities and easier task adaptation. The work lays groundwork for further refinements in memory formats, dynamic memory management, and end-to-end systems leveraging explicit memories for real-time deployment.
Abstract
The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining "abstract knowledge". As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named $\text{Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation.
