Table of Contents
Fetching ...

HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

Xiaochen Zhao, Kaikai Wang, Xiaowen Zhang, Chen Yao, Aili Wang

TL;DR

HyMem addresses the efficiency–accuracy trade-off in long-context reasoning by introducing a dual-granularity memory architecture with dynamic retrieval scheduling inspired by cognitive economy. It combines Level-1 summaries and Level-2 raw content, plus a lightweight memory module, a selectively activated deep memory module, and a reflection module for iterative reasoning. Empirical results on LoCoMo LOCOMO and LongMemEval benchmarks show HyMem achieving state-of-the-art efficiency and performance, including a 92.6% reduction in token cost relative to full-context baselines. The approach demonstrates the practicality of adaptive, memory-aware retrieval for robust, scalable long-term dialogue agents.

Abstract

Large language model (LLM) agents demonstrate strong performance in short-text contexts but often underperform in extended dialogues due to inefficient memory management. Existing approaches face a fundamental trade-off between efficiency and effectiveness: memory compression risks losing critical details required for complex reasoning, while retaining raw text introduces unnecessary computational overhead for simple queries. The crux lies in the limitations of monolithic memory representations and static retrieval mechanisms, which fail to emulate the flexible and proactive memory scheduling capabilities observed in humans, thus struggling to adapt to diverse problem scenarios. Inspired by the principle of cognitive economy, we propose HyMem, a hybrid memory architecture that enables dynamic on-demand scheduling through multi-granular memory representations. HyMem adopts a dual-granular storage scheme paired with a dynamic two-tier retrieval system: a lightweight module constructs summary-level context for efficient response generation, while an LLM-based deep module is selectively activated only for complex queries, augmented by a reflection mechanism for iterative reasoning refinement. Experiments show that HyMem achieves strong performance on both the LOCOMO and LongMemEval benchmarks, outperforming full-context while reducing computational cost by 92.6\%, establishing a state-of-the-art balance between efficiency and performance in long-term memory management.

HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

TL;DR

HyMem addresses the efficiency–accuracy trade-off in long-context reasoning by introducing a dual-granularity memory architecture with dynamic retrieval scheduling inspired by cognitive economy. It combines Level-1 summaries and Level-2 raw content, plus a lightweight memory module, a selectively activated deep memory module, and a reflection module for iterative reasoning. Empirical results on LoCoMo LOCOMO and LongMemEval benchmarks show HyMem achieving state-of-the-art efficiency and performance, including a 92.6% reduction in token cost relative to full-context baselines. The approach demonstrates the practicality of adaptive, memory-aware retrieval for robust, scalable long-term dialogue agents.

Abstract

Large language model (LLM) agents demonstrate strong performance in short-text contexts but often underperform in extended dialogues due to inefficient memory management. Existing approaches face a fundamental trade-off between efficiency and effectiveness: memory compression risks losing critical details required for complex reasoning, while retaining raw text introduces unnecessary computational overhead for simple queries. The crux lies in the limitations of monolithic memory representations and static retrieval mechanisms, which fail to emulate the flexible and proactive memory scheduling capabilities observed in humans, thus struggling to adapt to diverse problem scenarios. Inspired by the principle of cognitive economy, we propose HyMem, a hybrid memory architecture that enables dynamic on-demand scheduling through multi-granular memory representations. HyMem adopts a dual-granular storage scheme paired with a dynamic two-tier retrieval system: a lightweight module constructs summary-level context for efficient response generation, while an LLM-based deep module is selectively activated only for complex queries, augmented by a reflection mechanism for iterative reasoning refinement. Experiments show that HyMem achieves strong performance on both the LOCOMO and LongMemEval benchmarks, outperforming full-context while reducing computational cost by 92.6\%, establishing a state-of-the-art balance between efficiency and performance in long-term memory management.
Paper Structure (32 sections, 1 equation, 6 figures, 3 tables, 1 algorithm)

This paper contains 32 sections, 1 equation, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Conventional lightweight methods struggle with complex tasks, while sophisticated approaches incur high overhead for simple queries. In contrast, our HyMem dynamically allocates memory resources based on task demands, achieving dual optimization of performance and efficiency.
  • Figure 2: Our approach demonstrates superior efficiency, achieving the best balance between performance and computational cost (measured in tokens) on the LOCOMO benchmark.
  • Figure 3: (a) Workflow of the Memory Storage Module. This diagram illustrates the complete pipeline for constructing the dual-layer memory structure: generating core summaries (Level-1 memory) from pre-partitioned event units of raw dialogues (Level-2 memory), establishing many-to-one links to support memory backtracking during the recall stage, before final persistent storage in the database. (b) Inference Workflow within the Memory Recall Module. Queries are first routed to a lightweight module, with selective activation of a deep module when necessary, followed by iterative optimization and final review by the Reflection Module to produce the output.
  • Figure 4: (a) Workflow of the Lightweight Module. (b) Context Reconstruction via the Deep Memory Module by Backtracking from Level-1 to Level-2 Memory. (c) Process of Review and Iterative Optimization by the Reflection Module.
  • Figure 5: Comparison of performance and average token usage on the four LOCOMO task categories for Naive RAG with different retrieval $k$ values, Full Context, and our method.
  • ...and 1 more figures