Table of Contents
Fetching ...

Analysing Factorizations of Action-Value Networks for Cooperative Multi-Agent Reinforcement Learning

Jacopo Castellini, Frans A. Oliehoek, Rahul Savani, Shimon Whiteson

TL;DR

This work investigates how neural architectures represent the joint action-value function in cooperative multi-agent reinforcement learning by comparing factored and non-factored Q-value representations in one-shot games. It introduces two learning rules—Mixture of Experts and a Factored Q-function approach—over four coordination-graph structures and tests them on both non-factored and factored games, including generalizations like Generalized Firefighting and Aloha. Across extensive experiments, higher-order factorizations and complete factorizations often yield near-perfect action-value reconstructions and correct action rankings, while joint learners struggle with large joint action spaces; random overlapping factors also show strong performance in many settings. The results highlight a tradeoff between factor size, coordination requirements, and learning efficiency, offering guidance for scalable MARL design and suggesting avenues to extend these insights to sequential settings and recurrent architectures. Overall, the paper provides actionable evidence that factored representations can substantially improve learning power and scalability in cooperative MARL, especially as system size increases.

Abstract

Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which they fail. In this work, we empirically investigate the learning power of various network architectures on a series of one-shot games. Despite their simplicity, these games capture many of the crucial problems that arise in the multi-agent setting, such as an exponential number of joint actions or the lack of an explicit coordination mechanism. Our results extend those in [4] and quantify how well various approaches can represent the requisite value functions, and help us identify the reasons that can impede good performance, like sparsity of the values or too tight coordination requirements.

Analysing Factorizations of Action-Value Networks for Cooperative Multi-Agent Reinforcement Learning

TL;DR

This work investigates how neural architectures represent the joint action-value function in cooperative multi-agent reinforcement learning by comparing factored and non-factored Q-value representations in one-shot games. It introduces two learning rules—Mixture of Experts and a Factored Q-function approach—over four coordination-graph structures and tests them on both non-factored and factored games, including generalizations like Generalized Firefighting and Aloha. Across extensive experiments, higher-order factorizations and complete factorizations often yield near-perfect action-value reconstructions and correct action rankings, while joint learners struggle with large joint action spaces; random overlapping factors also show strong performance in many settings. The results highlight a tradeoff between factor size, coordination requirements, and learning efficiency, offering guidance for scalable MARL design and suggesting avenues to extend these insights to sequential settings and recurrent architectures. Overall, the paper provides actionable evidence that factored representations can substantially improve learning power and scalability in cooperative MARL, especially as system size increases.

Abstract

Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which they fail. In this work, we empirically investigate the learning power of various network architectures on a series of one-shot games. Despite their simplicity, these games capture many of the crucial problems that arise in the multi-agent setting, such as an exponential number of joint actions or the lack of an explicit coordination mechanism. Our results extend those in [4] and quantify how well various approaches can represent the requisite value functions, and help us identify the reasons that can impede good performance, like sparsity of the values or too tight coordination requirements.

Paper Structure

This paper contains 22 sections, 13 equations, 20 figures, 13 tables.

Figures (20)

  • Figure 1: Reconstructed $Q(a)$ for \ref{['sub:dispersion']} the Dispersion Game, and \ref{['sub:sparse']} its sparse variant.
  • Figure 2: Reconstructed $Q(a)$ for the Platonia Dilemma.
  • Figure 3: Reconstructed $Q(a)$ for the Climb Game: \ref{['sub:climb_factored']} factored $Q$-function learning approach, and \ref{['sub:climb_moe']} mixture of experts learning approach.
  • Figure 4: Reconstructed $Q(a)$ for the Penalty Game: \ref{['sub:penalty_factored']} factored $Q$-function learning approach, and \ref{['sub:penalty_moe']} mixture of experts learning approach.
  • Figure 5: Firefighters formation with $n=6$ agents and $N_h=7$ houses.
  • ...and 15 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2