Table of Contents
Fetching ...

Multi-Layered Memory Architectures for LLM Agents: An Experimental Evaluation of Long-Term Context Retention

Sunil Tiwari, Payal Fofadiya

Abstract

Long-horizon dialogue systems suffer from semanticdrift and unstable memory retention across extended sessions. This paper presents a Multi-Layer Memory Framework that decomposes dialogue history into working, episodic, and semantic layers with adaptive retrieval gating and retention regularization. The architecture controls cross-session drift while maintaining bounded context growth and computational efficiency. Experiments on LOCOMO, LOCCO, and LoCoMo show improved performance, achieving 46.85 Success Rate, 0.618 overall F1 with 0.594 multi-hop F1, and 56.90% six-period retention while reducing false memory rate to 5.1% and context usage to 58.40%. Results confirm enhanced long-term retention and reasoning stability under constrained context budgets.

Multi-Layered Memory Architectures for LLM Agents: An Experimental Evaluation of Long-Term Context Retention

Abstract

Long-horizon dialogue systems suffer from semanticdrift and unstable memory retention across extended sessions. This paper presents a Multi-Layer Memory Framework that decomposes dialogue history into working, episodic, and semantic layers with adaptive retrieval gating and retention regularization. The architecture controls cross-session drift while maintaining bounded context growth and computational efficiency. Experiments on LOCOMO, LOCCO, and LoCoMo show improved performance, achieving 46.85 Success Rate, 0.618 overall F1 with 0.594 multi-hop F1, and 56.90% six-period retention while reducing false memory rate to 5.1% and context usage to 58.40%. Results confirm enhanced long-term retention and reasoning stability under constrained context budgets.

Paper Structure

This paper contains 14 sections, 10 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Descriptive overview of the multi-layer memory framework showing dialogue processing, layered memory consolidation, adaptive retrieval, retention control, and response generation for stable long-horizon interactions.
  • Figure 2: Retention after six temporal periods on LOCCO. MLMF achieves higher long-term retention compared to Jia et al. jia2025evaluating.
  • Figure 3: F1 score comparison across representative memory architectures. The dotted line indicates the strongest baseline performance.
  • Figure 4: Ablation analysis of MLMF components across F1, retention, and false memory rate. The full model consistently outperforms its reduced variants, confirming the contribution of semantic consolidation, episodic accumulation, adaptive gating, and retention regularization.