Table of Contents
Fetching ...

ClusterComm: Discrete Communication in Decentralized MARL using Internal Representation Clustering

Robert Müller, Hasan Turalic, Thomy Phan, Michael Kölle, Jonas Nüßlein, Claudia Linnhoff-Popien

TL;DR

ClusterComm addresses the challenge of scalable, decentralized coordination in MARL by discretizing agents' internal representations into discrete messages via Mini-Batch-K-Means. The approach avoids central control and parameter sharing, training with PPO and using two variants to explore normalization and direct centroid transmission. Across four diverse environments, ClusterComm outperforms NoComm and remains competitive with LatentComm, demonstrating that discrete, low-bandwidth communication can achieve robust cooperative behavior. The work highlights the practicality and robustness of internal-representation-based discretization for scalable decentralized coordination, with future directions including improved efficiency for many agents and reduced information loss through richer messaging schemes.

Abstract

In the realm of Multi-Agent Reinforcement Learning (MARL), prevailing approaches exhibit shortcomings in aligning with human learning, robustness, and scalability. Addressing this, we introduce ClusterComm, a fully decentralized MARL framework where agents communicate discretely without a central control unit. ClusterComm utilizes Mini-Batch-K-Means clustering on the last hidden layer's activations of an agent's policy network, translating them into discrete messages. This approach outperforms no communication and competes favorably with unbounded, continuous communication and hence poses a simple yet effective strategy for enhancing collaborative task-solving in MARL.

ClusterComm: Discrete Communication in Decentralized MARL using Internal Representation Clustering

TL;DR

ClusterComm addresses the challenge of scalable, decentralized coordination in MARL by discretizing agents' internal representations into discrete messages via Mini-Batch-K-Means. The approach avoids central control and parameter sharing, training with PPO and using two variants to explore normalization and direct centroid transmission. Across four diverse environments, ClusterComm outperforms NoComm and remains competitive with LatentComm, demonstrating that discrete, low-bandwidth communication can achieve robust cooperative behavior. The work highlights the practicality and robustness of internal-representation-based discretization for scalable decentralized coordination, with future directions including improved efficiency for many agents and reduced information loss through richer messaging schemes.

Abstract

In the realm of Multi-Agent Reinforcement Learning (MARL), prevailing approaches exhibit shortcomings in aligning with human learning, robustness, and scalability. Addressing this, we introduce ClusterComm, a fully decentralized MARL framework where agents communicate discretely without a central control unit. ClusterComm utilizes Mini-Batch-K-Means clustering on the last hidden layer's activations of an agent's policy network, translating them into discrete messages. This approach outperforms no communication and competes favorably with unbounded, continuous communication and hence poses a simple yet effective strategy for enhancing collaborative task-solving in MARL.
Paper Structure (16 sections, 3 equations, 3 figures, 2 tables)

This paper contains 16 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: A visual depiction of ClusterComm's workflow for agent $1$. At time $t$, agent $1$ receives observation $o_t^{(1)}$ and the messages $m_{t-1}^{(2)}, \dots, m_{t-1}^{(n)}$ from the other $n-1$ agents. The output of the message encoder $\phi_m^{(1)}\left(m_{t-1}^{2:N}\right)$ and observation encoder $\phi_o^{(1)}\left(o_t^{(1)}\right)$ is concatenated and passed through $\phi_{a}^{(1)}$ to compute the next action $a_t^{(1)}$ and the next message $m_t^{(1)}$. Subsequently, messages are discretized using $\text{Mini-Batch K-Means}^{(1)}$ by choosing the cluster index of the closest centroid.
  • Figure 2: Visual depiction of the different domains used in this work.
  • Figure 3: Training curves for all environments.