LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

Chuanneng Sun; Songjun Huang; Dario Pompili

LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

Chuanneng Sun, Songjun Huang, Dario Pompili

TL;DR

The paper addresses extending multi-agent reinforcement learning (MARL) with Large Language Models (LLMs) to enable language-based coordination in multi-agent systems, framing MARL as a Dec-POMDP with $N$ agents. It reviews traditional MARL foundations (Dec-POMDP and centralized training with decentralized execution), single-agent LLM-based RL, and existing LLM-based MARL frameworks, clarifying roles in decision making, communication, and planning. It identifies four open directions—personality-enabled cooperation, language-enabled human-in/on-the-loop frameworks, co-design between MARL and LLMs, and safety/security in MAS—to guide future work. The discussion suggests that language-conditioned MARL can enhance robustness and interpretability in dynamic environments, with future work focusing on efficient on-board deployment and safe, scalable interactions among agents.

Abstract

In recent years, Large Language Models (LLMs) have shown great abilities in various tasks, including question answering, arithmetic problem solving, and poem writing, among others. Although research on LLM-as-an-agent has shown that LLM can be applied to Reinforcement Learning (RL) and achieve decent results, the extension of LLM-based RL to Multi-Agent System (MAS) is not trivial, as many aspects, such as coordination and communication between agents, are not considered in the RL frameworks of a single agent. To inspire more research on LLM-based MARL, in this letter, we survey the existing LLM-based single-agent and multi-agent RL frameworks and provide potential research directions for future research. In particular, we focus on the cooperative tasks of multiple agents with a common goal and communication among them. We also consider human-in/on-the-loop scenarios enabled by the language component in the framework.

LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

TL;DR

agents. It reviews traditional MARL foundations (Dec-POMDP and centralized training with decentralized execution), single-agent LLM-based RL, and existing LLM-based MARL frameworks, clarifying roles in decision making, communication, and planning. It identifies four open directions—personality-enabled cooperation, language-enabled human-in/on-the-loop frameworks, co-design between MARL and LLMs, and safety/security in MAS—to guide future work. The discussion suggests that language-conditioned MARL can enhance robustness and interpretability in dynamic environments, with future work focusing on efficient on-board deployment and safe, scalable interactions among agents.

Abstract

Paper Structure (12 sections, 2 figures, 1 table)

This paper contains 12 sections, 2 figures, 1 table.

Introduction
Preliminaries
MARL Problem Definition
Traditional MARL
LLM-based Single-Agent RL
Existing LLM-based MARL
Open Research Problems
Personality-enabled Cooperation
Language-enabled Human-in/on-the-Loop Frameworks
Traditional MARL and LLM Co-Design
Safety and Security in MAS
Conclusion

Figures (2)

Figure 1: Well-known Large Language Models (LLMs) over the past three years. Among them, only PaLM-E from Google is trained specifically for embodied applications, e.g., robot control.
Figure 2: Potential research directions for language-conditioned Multi-Agent Reinforcement Learning (MARL). (a) Personality-enabled cooperation, where different robots have different personalities defined by the commands. (b) Language-enabled human-on-the-loop frameworks, where humans supervise robots and provide feedback. (c) Traditional co-design of MARL and LLM, where knowledge about different aspects of LLM is distilled into smaller models that can be executed on board.

LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

TL;DR

Abstract

LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

Authors

TL;DR

Abstract

Table of Contents

Figures (2)