Table of Contents
Fetching ...

An Initial Introduction to Cooperative Multi-Agent Reinforcement Learning

Christopher Amato

TL;DR

This survey consolidates the landscape of cooperative multi-agent reinforcement learning by formalizing the cooperative Dec-POMDP setting and delineating three overarching training/execution paradigms: CTE, CTDE, and DTE. It emphasizes two primary learning approaches—value-based and policy-gradient—highlighting representative methods such as VDN, QMIX, QPLEX (value factorization) and MADDPG, COMA, MAPPO (centralized critics) that enable scalable, decentralized execution. The work discusses practical considerations like nonstationarity, hidden information, concurrent learning, and the role of memory (e.g., DRQN) in partially observable domains, and it clarifies misconceptions about CTDE versus fully decentralized learning. By connecting theory to practice, it identifies open questions about information use, critic design, and scalability, and outlines a broad research agenda for principled, scalable cooperative MARL with real-world impact in domains like robotics, traffic, and autonomous systems.

Abstract

Multi-agent reinforcement learning (MARL) has exploded in popularity in recent years. While numerous approaches have been developed, they can be broadly categorized into three main types: centralized training and execution (CTE), centralized training for decentralized execution (CTDE), and decentralized training and execution (DTE). CTE methods assume centralization during training and execution (e.g., with fast, free, and perfect communication) and have the most information during execution. CTDE methods are the most common, as they leverage centralized information during training while enabling decentralized execution -- using only information available to that agent during execution. Decentralized training and execution methods make the fewest assumptions and are often simple to implement. This text is an introduction to cooperative MARL -- MARL in which all agents share a single, joint reward. It is meant to explain the setting, basic concepts, and common methods for the CTE, CTDE, and DTE settings. It does not cover all work in cooperative MARL as the area is quite extensive. I have included work that I believe is important for understanding the main concepts in the area and apologize to those that I have omitted. Topics include simple applications of single-agent methods to CTE as well as some more scalable methods that exploit the multi-agent structure, independent Q-learning and policy gradient methods and their extensions, as well as value function factorization methods including the well-known VDN, QMIX, and QPLEX approaches, and centralized critic methods including MADDPG, COMA, and MAPPO. I also discuss common misconceptions, the relationship between different approaches, and some open questions.

An Initial Introduction to Cooperative Multi-Agent Reinforcement Learning

TL;DR

This survey consolidates the landscape of cooperative multi-agent reinforcement learning by formalizing the cooperative Dec-POMDP setting and delineating three overarching training/execution paradigms: CTE, CTDE, and DTE. It emphasizes two primary learning approaches—value-based and policy-gradient—highlighting representative methods such as VDN, QMIX, QPLEX (value factorization) and MADDPG, COMA, MAPPO (centralized critics) that enable scalable, decentralized execution. The work discusses practical considerations like nonstationarity, hidden information, concurrent learning, and the role of memory (e.g., DRQN) in partially observable domains, and it clarifies misconceptions about CTDE versus fully decentralized learning. By connecting theory to practice, it identifies open questions about information use, critic design, and scalability, and outlines a broad research agenda for principled, scalable cooperative MARL with real-world impact in domains like robotics, traffic, and autonomous systems.

Abstract

Multi-agent reinforcement learning (MARL) has exploded in popularity in recent years. While numerous approaches have been developed, they can be broadly categorized into three main types: centralized training and execution (CTE), centralized training for decentralized execution (CTDE), and decentralized training and execution (DTE). CTE methods assume centralization during training and execution (e.g., with fast, free, and perfect communication) and have the most information during execution. CTDE methods are the most common, as they leverage centralized information during training while enabling decentralized execution -- using only information available to that agent during execution. Decentralized training and execution methods make the fewest assumptions and are often simple to implement. This text is an introduction to cooperative MARL -- MARL in which all agents share a single, joint reward. It is meant to explain the setting, basic concepts, and common methods for the CTE, CTDE, and DTE settings. It does not cover all work in cooperative MARL as the area is quite extensive. I have included work that I believe is important for understanding the main concepts in the area and apologize to those that I have omitted. Topics include simple applications of single-agent methods to CTE as well as some more scalable methods that exploit the multi-agent structure, independent Q-learning and policy gradient methods and their extensions, as well as value function factorization methods including the well-known VDN, QMIX, and QPLEX approaches, and centralized critic methods including MADDPG, COMA, and MAPPO. I also discuss common misconceptions, the relationship between different approaches, and some open questions.
Paper Structure (18 sections, 3 equations, 2 figures, 3 algorithms)

This paper contains 18 sections, 3 equations, 2 figures, 3 algorithms.

Figures (2)

  • Figure 1: A depiction of cooperative MARL---a Dec-POMDP.
  • Figure 5: Concurrent experience replay trajectories (CERTs) (from Omidshafiei17)