Table of Contents
Fetching ...

Brain-Inspired Graph Multi-Agent Systems for LLM Reasoning

Guangfu Hao, Yuming Dai, Xianzhe Qin, Shan Yu

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of language tasks, yet complex multi-step reasoning remains a fundamental challenge. While Large Reasoning Models (LRMs) equipped with extended chain-of-thought mechanisms demonstrate improved performance over standard LLMs, both model types still suffer from accuracy collapse on sufficiently complex tasks, suggesting that scaling model-level reasoning alone is insufficient. Inspired by the global workspace theory of human cognition, we propose Brain-Inspired Graph Multi-Agent Systems (BIGMAS), in which specialized LLM agents are organized as nodes in a dynamically constructed directed graph and coordinate exclusively through a centralized shared workspace. A problem-adaptive GraphDesigner constructs task-specific agent topologies, while a global Orchestrator leverages the complete shared state for routing decisions, overcoming the local-view bottleneck of reactive approaches. Experiments on Game24, Six Fives, and Tower of London across six frontier LLMs demonstrate that BIGMAS consistently improves reasoning performance for both standard LLMs and LRMs, outperforming existing multi-agent baselines including ReAct and Tree of Thoughts, showing that multi-agent architectural design provides complementary gains orthogonal to model-level reasoning enhancements.

Brain-Inspired Graph Multi-Agent Systems for LLM Reasoning

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of language tasks, yet complex multi-step reasoning remains a fundamental challenge. While Large Reasoning Models (LRMs) equipped with extended chain-of-thought mechanisms demonstrate improved performance over standard LLMs, both model types still suffer from accuracy collapse on sufficiently complex tasks, suggesting that scaling model-level reasoning alone is insufficient. Inspired by the global workspace theory of human cognition, we propose Brain-Inspired Graph Multi-Agent Systems (BIGMAS), in which specialized LLM agents are organized as nodes in a dynamically constructed directed graph and coordinate exclusively through a centralized shared workspace. A problem-adaptive GraphDesigner constructs task-specific agent topologies, while a global Orchestrator leverages the complete shared state for routing decisions, overcoming the local-view bottleneck of reactive approaches. Experiments on Game24, Six Fives, and Tower of London across six frontier LLMs demonstrate that BIGMAS consistently improves reasoning performance for both standard LLMs and LRMs, outperforming existing multi-agent baselines including ReAct and Tree of Thoughts, showing that multi-agent architectural design provides complementary gains orthogonal to model-level reasoning enhancements.
Paper Structure (23 sections, 6 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 6 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Three cognitive reasoning tasks used in evaluation. Left: Tower of London task requires planning optimal moves to reach target configuration. Middle: Six Fives requires constructing arithmetic expressions using exactly six 5s to reach a target value. Right: Game24 demands mathematical reasoning to combine four numbers reaching target value 24.
  • Figure 2: Overview of the BIGMAS framework. (a) Graph Design: A GraphDesigner agent $\mathcal{D}$ analyzes the problem instance $\mathcal{P} = (x, \mathcal{C}, y^*)$ and produces a task-specific directed agent graph $\mathcal{G}$ together with a Workspace contract $\kappa$. (b) Workspace$\mathcal{B}$: A centralized shared workspace partitioned into read-only context $\mathcal{B}_{\text{ctx}}$, read-write working area $\mathcal{B}_{\text{work}}$, system metadata $\mathcal{B}_{\text{sys}}$, and sink-only answer store $\mathcal{B}_{\text{ans}}$; all agent nodes interact exclusively through $\mathcal{B}$. (c) Graph Execution: Each active node $v_t$ produces a structured write instruction $\omega_t = (\pi, \alpha, \delta)$ via an LLM call; the instruction is validated against $\mathcal{B}$ and $\mathcal{G}$, with a self-correction loop on failure. A global Orchestrator routes execution based on the complete workspace state; the system halts when the sink node $v_{\text{snk}}$ is reached or the step budget $T_{\max}$ is exhausted.
  • Figure 3: Grouped bar chart comparing the accuracy (%) of six LLMs on three reasoning benchmarks under two conditions: Base LLM (solid bars) and BIGMAS (hatched bars). The six models evaluated are DeepSeek-V3.2, DeepSeek-V3.2 (+thinking), Claude 4.5 Sonnet, Claude 4.5 (+thinking), Gemini 2.5 Pro, and GPT-5. The three tasks are Game 24 (arithmetic reasoning), Six Fives (constrained expression generation), and Tower of London (multi-step planning). For every model and task, BIGMAS consistently matches or improves upon the corresponding base-LLM accuracy, with particularly large gains on weaker base models (e.g., DeepSeek-V3.2 and Claude 4.5 Sonnet) and smaller but still positive gains on already strong models (e.g., GPT-5 and Gemini 2.5 Pro).
  • Figure 4: Distribution of graph complexity across three reasoning tasks, shown as violin plots overlaid with jittered individual observations. The left panel reports the number of nodes and the right panel the number of directed edges in each agent graph automatically designed by the GraphDesigner.
  • Figure 5: Representative agent graph structures automatically designed for each of the three reasoning tasks. Each panel displays the highest-node-count graph produced by the system for that task; all three instances were solved correctly (indicated by ✓).
  • ...and 4 more figures