Table of Contents
Fetching ...

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, Shuicheng Yan

TL;DR

The paper tackles memory inadequacy in LLM-based multi-agent systems by introducing G-Memory, a three-tier graph memory (insight, query, interaction) that enables self-evolution through structured long-horizon collaboration. It uses bi-directional memory traversal to extract high-level insights and condensed interaction trajectories, updating the hierarchy after each task to institutionalize group knowledge. Empirical results across five benchmarks, three MAS frameworks, and multiple LLM backbones show significant performance gains (up to ~20.9% in embodied action and ~10.1% in knowledge QA) with modest token-cost overhead, validating the approach's generality and efficiency. The framework is designed as a plug-in for existing MAS, promoting scalable collective intelligence while noting limitations and the need for broader-domain validation and safeguards in deployment.

Abstract

Large language model (LLM)-powered multi-agent systems (MAS) have demonstrated cognitive and execution capabilities that far exceed those of single LLM agents, yet their capacity for self-evolution remains hampered by underdeveloped memory architectures. Upon close inspection, we are alarmed to discover that prevailing MAS memory mechanisms (1) are overly simplistic, completely disregarding the nuanced inter-agent collaboration trajectories, and (2) lack cross-trial and agent-specific customization, in stark contrast to the expressive memory developed for single agents. To bridge this gap, we introduce G-Memory, a hierarchical, agentic memory system for MAS inspired by organizational memory theory, which manages the lengthy MAS interaction via a three-tier graph hierarchy: insight, query, and interaction graphs. Upon receiving a new user query, G-Memory performs bi-directional memory traversal to retrieve both $\textit{high-level, generalizable insights}$ that enable the system to leverage cross-trial knowledge, and $\textit{fine-grained, condensed interaction trajectories}$ that compactly encode prior collaboration experiences. Upon task execution, the entire hierarchy evolves by assimilating new collaborative trajectories, nurturing the progressive evolution of agent teams. Extensive experiments across five benchmarks, three LLM backbones, and three popular MAS frameworks demonstrate that G-Memory improves success rates in embodied action and accuracy in knowledge QA by up to $20.89\%$ and $10.12\%$, respectively, without any modifications to the original frameworks. Our codes are available at https://github.com/bingreeky/GMemory.

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

TL;DR

The paper tackles memory inadequacy in LLM-based multi-agent systems by introducing G-Memory, a three-tier graph memory (insight, query, interaction) that enables self-evolution through structured long-horizon collaboration. It uses bi-directional memory traversal to extract high-level insights and condensed interaction trajectories, updating the hierarchy after each task to institutionalize group knowledge. Empirical results across five benchmarks, three MAS frameworks, and multiple LLM backbones show significant performance gains (up to ~20.9% in embodied action and ~10.1% in knowledge QA) with modest token-cost overhead, validating the approach's generality and efficiency. The framework is designed as a plug-in for existing MAS, promoting scalable collective intelligence while noting limitations and the need for broader-domain validation and safeguards in deployment.

Abstract

Large language model (LLM)-powered multi-agent systems (MAS) have demonstrated cognitive and execution capabilities that far exceed those of single LLM agents, yet their capacity for self-evolution remains hampered by underdeveloped memory architectures. Upon close inspection, we are alarmed to discover that prevailing MAS memory mechanisms (1) are overly simplistic, completely disregarding the nuanced inter-agent collaboration trajectories, and (2) lack cross-trial and agent-specific customization, in stark contrast to the expressive memory developed for single agents. To bridge this gap, we introduce G-Memory, a hierarchical, agentic memory system for MAS inspired by organizational memory theory, which manages the lengthy MAS interaction via a three-tier graph hierarchy: insight, query, and interaction graphs. Upon receiving a new user query, G-Memory performs bi-directional memory traversal to retrieve both that enable the system to leverage cross-trial knowledge, and that compactly encode prior collaboration experiences. Upon task execution, the entire hierarchy evolves by assimilating new collaborative trajectories, nurturing the progressive evolution of agent teams. Extensive experiments across five benchmarks, three LLM backbones, and three popular MAS frameworks demonstrate that G-Memory improves success rates in embodied action and accuracy in knowledge QA by up to and , respectively, without any modifications to the original frameworks. Our codes are available at https://github.com/bingreeky/GMemory.

Paper Structure

This paper contains 45 sections, 13 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: (Left) We report the token cost of several single-agent and MAS baselines on ALFWorld benchmark; (Right) The overview of G-Memory's three-tier hierarchical memory architecture, encompassing the insight graph, query graph and interaction (utterance) graph.
  • Figure 2: The overview of our proposed G-Memory.
  • Figure 3: Cost analysis of G-Memory. We showcase the performance versus the overall system token cost when combined with different memory architectures.
  • Figure 4: (a) Sensitivity analysis of the hop expansion in \ref{['eq:hop_expansion']}; (b) Sensitivity analysis of the number of selected queries $k$ in \ref{['eq:similarity']}; (c) We study two variants of G-Memory: merely providing high-level insights (i.e., the insights $\mathcal{I}^\mathcal{S}$ in \ref{['eq:upward_retrieval']}) or fine-grained interactions (i.e., the core trajectories in \ref{['eq:downward_retrieval']}). All the experiments here are done with Qwen-2.5-14b.
  • Figure 5: Case study of G-Memory.
  • ...and 6 more figures