LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions
Chuanneng Sun, Songjun Huang, Dario Pompili
TL;DR
The paper addresses extending multi-agent reinforcement learning (MARL) with Large Language Models (LLMs) to enable language-based coordination in multi-agent systems, framing MARL as a Dec-POMDP with $N$ agents. It reviews traditional MARL foundations (Dec-POMDP and centralized training with decentralized execution), single-agent LLM-based RL, and existing LLM-based MARL frameworks, clarifying roles in decision making, communication, and planning. It identifies four open directions—personality-enabled cooperation, language-enabled human-in/on-the-loop frameworks, co-design between MARL and LLMs, and safety/security in MAS—to guide future work. The discussion suggests that language-conditioned MARL can enhance robustness and interpretability in dynamic environments, with future work focusing on efficient on-board deployment and safe, scalable interactions among agents.
Abstract
In recent years, Large Language Models (LLMs) have shown great abilities in various tasks, including question answering, arithmetic problem solving, and poem writing, among others. Although research on LLM-as-an-agent has shown that LLM can be applied to Reinforcement Learning (RL) and achieve decent results, the extension of LLM-based RL to Multi-Agent System (MAS) is not trivial, as many aspects, such as coordination and communication between agents, are not considered in the RL frameworks of a single agent. To inspire more research on LLM-based MARL, in this letter, we survey the existing LLM-based single-agent and multi-agent RL frameworks and provide potential research directions for future research. In particular, we focus on the cooperative tasks of multiple agents with a common goal and communication among them. We also consider human-in/on-the-loop scenarios enabled by the language component in the framework.
