Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy

Ziqi Jia; Junjie Li; Xiaoyang Qu; Jianzong Wang

Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy

Ziqi Jia, Junjie Li, Xiaoyang Qu, Jianzong Wang

TL;DR

This paper addresses the challenge of coordinating multiple agents under safety and adaptability constraints by integrating a Large Language Model (LLM) planner with a graph-based MARL policy. The core contributions are an LLM-based planner with a critic, an LLM-driven reward function generator, and a graph-based collaboration meta policy that uses an action dependency graph to coordinate agents, enhanced by meta-learning for new task environments. Empirical results on the AI2-THOR platform show that LGC-MARL outperforms both centralized LLM planning and purely dialog-based LLM methods, with higher success rates, faster task completion, and lower language-model token costs; ablations confirm the critical roles of each component. The work provides a scalable, efficient approach to leveraging LLMs in MARL, offering practical implications for real-world multi-agent systems requiring robust coordination and safety.

Abstract

Multi-agent systems (MAS) have shown great potential in executing complex tasks, but coordination and safety remain significant challenges. Multi-Agent Reinforcement Learning (MARL) offers a promising framework for agent collaboration, but it faces difficulties in handling complex tasks and designing reward functions. The introduction of Large Language Models (LLMs) has brought stronger reasoning and cognitive abilities to MAS, but existing LLM-based systems struggle to respond quickly and accurately in dynamic environments. To address these challenges, we propose LLM-based Graph Collaboration MARL (LGC-MARL), a framework that efficiently combines LLMs and MARL. This framework decomposes complex tasks into executable subtasks and achieves efficient collaboration among multiple agents through graph-based coordination. Specifically, LGC-MARL consists of two main components: an LLM planner and a graph-based collaboration meta policy. The LLM planner transforms complex task instructions into a series of executable subtasks, evaluates the rationality of these subtasks using a critic model, and generates an action dependency graph. The graph-based collaboration meta policy facilitates communication and collaboration among agents based on the action dependency graph, and adapts to new task environments through meta-learning. Experimental results on the AI2-THOR simulation platform demonstrate the superior performance and scalability of LGC-MARL in completing various complex tasks.

Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy

TL;DR

Abstract

Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)