Table of Contents
Fetching ...

Cognitive Memory in Large Language Models

Lianlei Shan, Shixian Luo, Zezhou Zhu, Yu Yuan, Yong Wu

TL;DR

<3-5 sentence high-level summary> The paper provides a comprehensive survey of memory mechanisms in large language models, distinguishing memory from knowledge and profiling and arguing memory enables continuity, reduces hallucinations, and boosts efficiency. It categorizes memory into sensor, short-term, and long-term forms and surveys text-based memory, KV-cache-based memory, parameter-based memory, and hidden-state-based memory, detailing acquisition, management, and utilization methods across these modalities. It highlights concrete techniques such as retrieval-augmented generation, knowledge graphs, LSH, low-rank KV compression, LoRA/MoE/TTT for memory parameterization, and chunk-based hidden-state strategies, illustrating a broad landscape of memory architectures. The discussion points to practical implications for scalable, context-rich AI and outlines cognitive memory as a future direction to bridge gaps between human-like memory and machine memory.

Abstract

This paper examines memory mechanisms in Large Language Models (LLMs), emphasizing their importance for context-rich responses, reduced hallucinations, and improved efficiency. It categorizes memory into sensory, short-term, and long-term, with sensory memory corresponding to input prompts, short-term memory processing immediate context, and long-term memory implemented via external databases or structures. The text-based memory section covers acquisition (selection and summarization), management (updating, accessing, storing, and resolving conflicts), and utilization (full-text search, SQL queries, semantic search). The KV cache-based memory section discusses selection methods (regularity-based summarization, score-based approaches, special token embeddings) and compression techniques (low-rank compression, KV merging, multimodal compression), along with management strategies like offloading and shared attention mechanisms. Parameter-based memory methods (LoRA, TTT, MoE) transform memories into model parameters to enhance efficiency, while hidden-state-based memory approaches (chunk mechanisms, recurrent transformers, Mamba model) improve long-text processing by combining RNN hidden states with current methods. Overall, the paper offers a comprehensive analysis of LLM memory mechanisms, highlighting their significance and future research directions.

Cognitive Memory in Large Language Models

TL;DR

<3-5 sentence high-level summary> The paper provides a comprehensive survey of memory mechanisms in large language models, distinguishing memory from knowledge and profiling and arguing memory enables continuity, reduces hallucinations, and boosts efficiency. It categorizes memory into sensor, short-term, and long-term forms and surveys text-based memory, KV-cache-based memory, parameter-based memory, and hidden-state-based memory, detailing acquisition, management, and utilization methods across these modalities. It highlights concrete techniques such as retrieval-augmented generation, knowledge graphs, LSH, low-rank KV compression, LoRA/MoE/TTT for memory parameterization, and chunk-based hidden-state strategies, illustrating a broad landscape of memory architectures. The discussion points to practical implications for scalable, context-rich AI and outlines cognitive memory as a future direction to bridge gaps between human-like memory and machine memory.

Abstract

This paper examines memory mechanisms in Large Language Models (LLMs), emphasizing their importance for context-rich responses, reduced hallucinations, and improved efficiency. It categorizes memory into sensory, short-term, and long-term, with sensory memory corresponding to input prompts, short-term memory processing immediate context, and long-term memory implemented via external databases or structures. The text-based memory section covers acquisition (selection and summarization), management (updating, accessing, storing, and resolving conflicts), and utilization (full-text search, SQL queries, semantic search). The KV cache-based memory section discusses selection methods (regularity-based summarization, score-based approaches, special token embeddings) and compression techniques (low-rank compression, KV merging, multimodal compression), along with management strategies like offloading and shared attention mechanisms. Parameter-based memory methods (LoRA, TTT, MoE) transform memories into model parameters to enhance efficiency, while hidden-state-based memory approaches (chunk mechanisms, recurrent transformers, Mamba model) improve long-text processing by combining RNN hidden states with current methods. Overall, the paper offers a comprehensive analysis of LLM memory mechanisms, highlighting their significance and future research directions.

Paper Structure

This paper contains 56 sections, 31 equations, 9 figures.

Figures (9)

  • Figure 1: Short Memory
  • Figure 2: Long Memory
  • Figure 3: Text-based memory acquisition.
  • Figure 4: Text-based memory management
  • Figure 5: Text-based memory utilization
  • ...and 4 more figures