Table of Contents
Fetching ...

Learning Multi-Agent Communication from Graph Modeling Perspective

Shengchao Hu, Li Shen, Ya Zhang, Dacheng Tao

TL;DR

This paper tackles scalable, bandwidth-aware inter-agent communication in multi-agent reinforcement learning by modeling the communication topology as a learnable graph. It introduces CommFormer, a Communication Transformer that uses a learnable adjacency matrix and continuous relaxation to perform bi-level optimization, training the graph structure and policies end-to-end. The encoder with edge-aware attention and an autoregressive decoder, guided by PPO, enables efficient, credit-assigned messaging under sparsity constraints, with the Gumbel-Max trick enforcing the k-hot adjacency. Empirical results across Predator-Prey, Predator-Capture-Prey, StarCraft II SMAC, and Google Research Football show that CommFormer outperforms fixed-architecture baselines and closely matches fully connected communication while reducing bandwidth, with ablations confirming robustness and the benefits of architecture search. Overall, the work provides a scalable approach to learning communication topology that can adapt to different task demands and agent counts in MARL settings.

Abstract

In numerous artificial intelligence applications, the collaborative efforts of multiple intelligent agents are imperative for the successful attainment of target objectives. To enhance coordination among these agents, a distributed communication framework is often employed. However, information sharing among all agents proves to be resource-intensive, while the adoption of a manually pre-defined communication architecture imposes limitations on inter-agent communication, thereby constraining the potential for collaborative efforts. In this study, we introduce a novel approach wherein we conceptualize the communication architecture among agents as a learnable graph. We formulate this problem as the task of determining the communication graph while enabling the architecture parameters to update normally, thus necessitating a bi-level optimization process. Utilizing continuous relaxation of the graph representation and incorporating attention units, our proposed approach, CommFormer, efficiently optimizes the communication graph and concurrently refines architectural parameters through gradient descent in an end-to-end manner. Extensive experiments on a variety of cooperative tasks substantiate the robustness of our model across diverse cooperative scenarios, where agents are able to develop more coordinated and sophisticated strategies regardless of changes in the number of agents.

Learning Multi-Agent Communication from Graph Modeling Perspective

TL;DR

This paper tackles scalable, bandwidth-aware inter-agent communication in multi-agent reinforcement learning by modeling the communication topology as a learnable graph. It introduces CommFormer, a Communication Transformer that uses a learnable adjacency matrix and continuous relaxation to perform bi-level optimization, training the graph structure and policies end-to-end. The encoder with edge-aware attention and an autoregressive decoder, guided by PPO, enables efficient, credit-assigned messaging under sparsity constraints, with the Gumbel-Max trick enforcing the k-hot adjacency. Empirical results across Predator-Prey, Predator-Capture-Prey, StarCraft II SMAC, and Google Research Football show that CommFormer outperforms fixed-architecture baselines and closely matches fully connected communication while reducing bandwidth, with ablations confirming robustness and the benefits of architecture search. Overall, the work provides a scalable approach to learning communication topology that can adapt to different task demands and agent counts in MARL settings.

Abstract

In numerous artificial intelligence applications, the collaborative efforts of multiple intelligent agents are imperative for the successful attainment of target objectives. To enhance coordination among these agents, a distributed communication framework is often employed. However, information sharing among all agents proves to be resource-intensive, while the adoption of a manually pre-defined communication architecture imposes limitations on inter-agent communication, thereby constraining the potential for collaborative efforts. In this study, we introduce a novel approach wherein we conceptualize the communication architecture among agents as a learnable graph. We formulate this problem as the task of determining the communication graph while enabling the architecture parameters to update normally, thus necessitating a bi-level optimization process. Utilizing continuous relaxation of the graph representation and incorporating attention units, our proposed approach, CommFormer, efficiently optimizes the communication graph and concurrently refines architectural parameters through gradient descent in an end-to-end manner. Extensive experiments on a variety of cooperative tasks substantiate the robustness of our model across diverse cooperative scenarios, where agents are able to develop more coordinated and sophisticated strategies regardless of changes in the number of agents.
Paper Structure (19 sections, 10 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 19 sections, 10 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: The performance of pre-defined communication architectures across various StarCraftII combat games, each with 10 different seeds. The notable variance observed underscores the importance of searching for the optimal communication architecture.
  • Figure 2: The overview of our proposed CommFormer. CommFormer initiates by establishing the communication graph, which subsequently serves as both the masking and edge embeddings in the encoder and decoder to ensure that agents can exclusively access messages from communicated agents. Subsequently, the encoder and decoder modules come into play, processing a sequence of agents' observations and transforming them into a sequence of optimal actions.
  • Figure 3: Performance comparison on SMAC tasks with different sparsity $\mathcal{S}$. Note that as the value of sparsity $\mathcal{S}$ gradually increases, the performance of CommFormer improves across various environments. This effect is particularly pronounced in environments with a large number of agents.
  • Figure 4: Performance comparison on SMAC tasks with different manually pre-defined communication architectures. CommFormer consistently achieves optimal performance, which underscores its capability to autonomously search for the optimal communication architecture, highlighting its adaptability across various scenarios and tasks.
  • Figure 5: The searching process of CommFormer in the SMAC task 1c3s5z. In this representation, a white square corresponds to a value of 1, indicating the presence of an edge connection.
  • ...and 1 more figures