GraphCogent: Mitigating LLMs' Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding

Rongzheng Wang; Shuang Liang; Qizhi Chen; Yihong Huang; Muquan Li; Yizhuo Ma; Dongyang Zhang; Ke Qin; Man-Fai Leung

GraphCogent: Mitigating LLMs' Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding

Rongzheng Wang, Shuang Liang, Qizhi Chen, Yihong Huang, Muquan Li, Yizhuo Ma, Dongyang Zhang, Ke Qin, Man-Fai Leung

TL;DR

GraphCogent addresses the memory bottlenecks of LLMs in real-world graph reasoning by adopting a cognitive-inspired sensory-buffer-execution architecture and introducing Graph4real, a large-scale, domain-diverse benchmark. The framework decomposes graph reasoning into perception (Sensory), integration (Buffer), and action (Execution) with a hybrid approach of tool-calling and tool-creation to manage diverse representations and dynamic tasks. Key contributions include a Graph N-back memory test, a Graph Verifier to ensure transformation reliability, a cross-format Buffer Module, and a two-stage Execution Agent with CMPO-guided tool discrimination and a Tool Creator for on-demand tool synthesis. Experimental results demonstrate robust, scalable performance across four real-world domains, with high accuracy, substantial token savings, and strong cross-dataset generalization, highlighting practical impact for robust graph reasoning in real-world settings.

Abstract

Large language models (LLMs) show promising performance on small-scale graph reasoning tasks but fail when handling real-world graphs with complex queries. This phenomenon arises from LLMs' working memory constraints, which result in their inability to retain long-range graph topology over extended contexts while sustaining coherent multi-step reasoning. However, real-world graphs are often structurally complex, such as Web, Transportation, Social, and Citation networks. To address these limitations, we propose GraphCogent, a collaborative agent framework inspired by human Working Memory Model that decomposes graph reasoning into specialized cognitive processes: sense, buffer, and execute. The framework consists of three modules: Sensory Module standardizes diverse graph text representations via subgraph sampling, Buffer Module integrates and indexes graph data across multiple formats, and Execution Module combines tool calling and tool creation for efficient reasoning. We also introduce Graph4real, a comprehensive benchmark that contains four domains of real-world graphs (Web, Transportation, Social, and Citation) to evaluate LLMs' graph reasoning capabilities. Our Graph4real covers 21 different graph reasoning tasks, categorized into three types (Structural Querying, Algorithmic Reasoning, and Predictive Modeling tasks), with graph scales up to 10 times larger than existing benchmarks. Experiments show that Llama3.1-8B based GraphCogent achieves a 50% improvement over massive-scale LLMs like DeepSeek-R1 (671B). Compared to state-of-the-art agent-based baseline, our framework outperforms by 20% in accuracy while reducing token usage by 80% for in-toolset tasks and 30% for out-toolset tasks. Code will be available after review.

GraphCogent: Mitigating LLMs' Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding

TL;DR

Abstract

GraphCogent: Mitigating LLMs' Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)