Cognitive Memory in Large Language Models
Lianlei Shan, Shixian Luo, Zezhou Zhu, Yu Yuan, Yong Wu
TL;DR
<3-5 sentence high-level summary> The paper provides a comprehensive survey of memory mechanisms in large language models, distinguishing memory from knowledge and profiling and arguing memory enables continuity, reduces hallucinations, and boosts efficiency. It categorizes memory into sensor, short-term, and long-term forms and surveys text-based memory, KV-cache-based memory, parameter-based memory, and hidden-state-based memory, detailing acquisition, management, and utilization methods across these modalities. It highlights concrete techniques such as retrieval-augmented generation, knowledge graphs, LSH, low-rank KV compression, LoRA/MoE/TTT for memory parameterization, and chunk-based hidden-state strategies, illustrating a broad landscape of memory architectures. The discussion points to practical implications for scalable, context-rich AI and outlines cognitive memory as a future direction to bridge gaps between human-like memory and machine memory.
Abstract
This paper examines memory mechanisms in Large Language Models (LLMs), emphasizing their importance for context-rich responses, reduced hallucinations, and improved efficiency. It categorizes memory into sensory, short-term, and long-term, with sensory memory corresponding to input prompts, short-term memory processing immediate context, and long-term memory implemented via external databases or structures. The text-based memory section covers acquisition (selection and summarization), management (updating, accessing, storing, and resolving conflicts), and utilization (full-text search, SQL queries, semantic search). The KV cache-based memory section discusses selection methods (regularity-based summarization, score-based approaches, special token embeddings) and compression techniques (low-rank compression, KV merging, multimodal compression), along with management strategies like offloading and shared attention mechanisms. Parameter-based memory methods (LoRA, TTT, MoE) transform memories into model parameters to enhance efficiency, while hidden-state-based memory approaches (chunk mechanisms, recurrent transformers, Mamba model) improve long-text processing by combining RNN hidden states with current methods. Overall, the paper offers a comprehensive analysis of LLM memory mechanisms, highlighting their significance and future research directions.
