Table of Contents
Fetching ...

LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning

Hanqing Yang, Jingdi Chen, Marie Siew, Tania Lorido-Botran, Carlee Joe-Wong

TL;DR

This work addresses the challenge of scalable, long-horizon cooperative planning in decentralized multi-agent systems operating in open-world environments. It proposes DAMCS, a framework that combines an Adaptive Knowledge Graph Memory System (A-KGMS) with a Structured Communication System (S-CS) to empower LLM-powered agents to plan, reason, and collaborate without centralized long-term control, formalized within a Dec-POMDP $D=ig\langle I, n, S, A, P, \Omega, O, g, R \big\rangle$. To evaluate the approach, the authors introduce Multi-Agent Crafter (MAC), an open-world testbed supporting arbitrary agent counts and resource-sharing tasks that stress macro-management and coordination. Empirical results show that DAMCS outperforms traditional MARL and pure LLM baselines, with significant reductions in steps to obtain a diamond across 2- and 6-agent settings; ablations confirm the critical roles of memory integration and structured communication in achieving cooperative planning and efficiency. The findings demonstrate the practical impact of integrating hierarchical knowledge graphs and disciplined inter-agent messaging for scalable, decentralized cooperation in dynamic environments, with potential implications for real-world multi-agent systems and interactive AI planning.

Abstract

Developing intelligent agents for long-term cooperation in dynamic open-world scenarios is a major challenge in multi-agent systems. Traditional Multi-agent Reinforcement Learning (MARL) frameworks like centralized training decentralized execution (CTDE) struggle with scalability and flexibility. They require centralized long-term planning, which is difficult without custom reward functions, and face challenges in processing multi-modal data. CTDE approaches also assume fixed cooperation strategies, making them impractical in dynamic environments where agents need to adapt and plan independently. To address decentralized multi-agent cooperation, we propose Decentralized Adaptive Knowledge Graph Memory and Structured Communication System (DAMCS) in a novel Multi-agent Crafter environment. Our generative agents, powered by Large Language Models (LLMs), are more scalable than traditional MARL agents by leveraging external knowledge and language for long-term planning and reasoning. Instead of fully sharing information from all past experiences, DAMCS introduces a multi-modal memory system organized as a hierarchical knowledge graph and a structured communication protocol to optimize agent cooperation. This allows agents to reason from past interactions and share relevant information efficiently. Experiments on novel multi-agent open-world tasks show that DAMCS outperforms both MARL and LLM baselines in task efficiency and collaboration. Compared to single-agent scenarios, the two-agent scenario achieves the same goal with 63% fewer steps, and the six-agent scenario with 74% fewer steps, highlighting the importance of adaptive memory and structured communication in achieving long-term goals. We publicly release our project at: https://happyeureka.github.io/damcs.

LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning

TL;DR

This work addresses the challenge of scalable, long-horizon cooperative planning in decentralized multi-agent systems operating in open-world environments. It proposes DAMCS, a framework that combines an Adaptive Knowledge Graph Memory System (A-KGMS) with a Structured Communication System (S-CS) to empower LLM-powered agents to plan, reason, and collaborate without centralized long-term control, formalized within a Dec-POMDP . To evaluate the approach, the authors introduce Multi-Agent Crafter (MAC), an open-world testbed supporting arbitrary agent counts and resource-sharing tasks that stress macro-management and coordination. Empirical results show that DAMCS outperforms traditional MARL and pure LLM baselines, with significant reductions in steps to obtain a diamond across 2- and 6-agent settings; ablations confirm the critical roles of memory integration and structured communication in achieving cooperative planning and efficiency. The findings demonstrate the practical impact of integrating hierarchical knowledge graphs and disciplined inter-agent messaging for scalable, decentralized cooperation in dynamic environments, with potential implications for real-world multi-agent systems and interactive AI planning.

Abstract

Developing intelligent agents for long-term cooperation in dynamic open-world scenarios is a major challenge in multi-agent systems. Traditional Multi-agent Reinforcement Learning (MARL) frameworks like centralized training decentralized execution (CTDE) struggle with scalability and flexibility. They require centralized long-term planning, which is difficult without custom reward functions, and face challenges in processing multi-modal data. CTDE approaches also assume fixed cooperation strategies, making them impractical in dynamic environments where agents need to adapt and plan independently. To address decentralized multi-agent cooperation, we propose Decentralized Adaptive Knowledge Graph Memory and Structured Communication System (DAMCS) in a novel Multi-agent Crafter environment. Our generative agents, powered by Large Language Models (LLMs), are more scalable than traditional MARL agents by leveraging external knowledge and language for long-term planning and reasoning. Instead of fully sharing information from all past experiences, DAMCS introduces a multi-modal memory system organized as a hierarchical knowledge graph and a structured communication protocol to optimize agent cooperation. This allows agents to reason from past interactions and share relevant information efficiently. Experiments on novel multi-agent open-world tasks show that DAMCS outperforms both MARL and LLM baselines in task efficiency and collaboration. Compared to single-agent scenarios, the two-agent scenario achieves the same goal with 63% fewer steps, and the six-agent scenario with 74% fewer steps, highlighting the importance of adaptive memory and structured communication in achieving long-term goals. We publicly release our project at: https://happyeureka.github.io/damcs.

Paper Structure

This paper contains 28 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: The Multi-agent Crafter Environment. Agents spawn in a shared environment and interact to collect a diamond as quickly as possible, terminating the session upon success. To achieve this, they must craft tools in a hierarchical order while maintaining their health stats.
  • Figure 2: Framework Overview. Multiple agents respawn on the map and interact with each other through a memory system and communication protocol, aiming to collect a diamond as fast as possible.
  • Figure 3: Memory System. The system consists of working memory and long-term memory. Sensory inputs (1) are captured in working memory (2), alongside relevant information retrieved from long-term memory (4). The agent 'thinks' using an MLLM (3) to generate responses and action plans, which are then stored in long-term memory. A consolidation process updates the goal-oriented hierarchical knowledge graph (5), linking new experiences to past events. This graph comprises experience nodes$E$, goal nodes$G$, and long-term goal nodes$LTG$.
  • Figure 4: Communication Protocol. Agents collaborate by exchanging messages to coordinate tasks and share resources. An arrow from agent $i$ to agent $j$ indicates that agent $i$ is helping agent $j$; communication then flows in the opposite direction.
  • Figure 5: Evaluation of $n$-RL-trained agents in MAC: Both PPO-trained and MADDPG-trained agents initially show increasing total rewards, indicating active learning. However, they fail to achieve higher rewards as further improvements require acquiring advanced skills in a hierarchical order. Learning remains prohibitively slow for both RL agents.
  • ...and 4 more figures