Table of Contents
Fetching ...

Communication Learning in Multi-Agent Systems from Graph Modeling Perspective

Shengchao Hu, Li Shen, Ya Zhang, Dacheng Tao

TL;DR

The proposed approach, CommFormer, efficiently optimizes the communication graph and concurrently refines architectural parameters through gradient descent in an end-to-end manner and introduces a temporal gating mechanism for each agent, enabling dynamic decisions on whether to receive shared information based on current observations, thus improving decision-making efficiency.

Abstract

In numerous artificial intelligence applications, the collaborative efforts of multiple intelligent agents are imperative for the successful attainment of target objectives. To enhance coordination among these agents, a distributed communication framework is often employed. However, indiscriminate information sharing among all agents can be resource-intensive, and the adoption of manually pre-defined communication architectures imposes constraints on inter-agent communication, thus limiting the potential for effective collaboration. Moreover, the communication framework often remains static during inference, which may result in sustained high resource consumption, as in most cases, only key decisions necessitate information sharing among agents. In this study, we introduce a novel approach wherein we conceptualize the communication architecture among agents as a learnable graph. We formulate this problem as the task of determining the communication graph while enabling the architecture parameters to update normally, thus necessitating a bi-level optimization process. Utilizing continuous relaxation of the graph representation and incorporating attention units, our proposed approach, CommFormer, efficiently optimizes the communication graph and concurrently refines architectural parameters through gradient descent in an end-to-end manner. Additionally, we introduce a temporal gating mechanism for each agent, enabling dynamic decisions on whether to receive shared information at a given time, based on current observations, thus improving decision-making efficiency. Extensive experiments on a variety of cooperative tasks substantiate the robustness of our model across diverse cooperative scenarios, where agents are able to develop more coordinated and sophisticated strategies regardless of changes in the number of agents.

Communication Learning in Multi-Agent Systems from Graph Modeling Perspective

TL;DR

The proposed approach, CommFormer, efficiently optimizes the communication graph and concurrently refines architectural parameters through gradient descent in an end-to-end manner and introduces a temporal gating mechanism for each agent, enabling dynamic decisions on whether to receive shared information based on current observations, thus improving decision-making efficiency.

Abstract

In numerous artificial intelligence applications, the collaborative efforts of multiple intelligent agents are imperative for the successful attainment of target objectives. To enhance coordination among these agents, a distributed communication framework is often employed. However, indiscriminate information sharing among all agents can be resource-intensive, and the adoption of manually pre-defined communication architectures imposes constraints on inter-agent communication, thus limiting the potential for effective collaboration. Moreover, the communication framework often remains static during inference, which may result in sustained high resource consumption, as in most cases, only key decisions necessitate information sharing among agents. In this study, we introduce a novel approach wherein we conceptualize the communication architecture among agents as a learnable graph. We formulate this problem as the task of determining the communication graph while enabling the architecture parameters to update normally, thus necessitating a bi-level optimization process. Utilizing continuous relaxation of the graph representation and incorporating attention units, our proposed approach, CommFormer, efficiently optimizes the communication graph and concurrently refines architectural parameters through gradient descent in an end-to-end manner. Additionally, we introduce a temporal gating mechanism for each agent, enabling dynamic decisions on whether to receive shared information at a given time, based on current observations, thus improving decision-making efficiency. Extensive experiments on a variety of cooperative tasks substantiate the robustness of our model across diverse cooperative scenarios, where agents are able to develop more coordinated and sophisticated strategies regardless of changes in the number of agents.

Paper Structure

This paper contains 16 sections, 15 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The performance of pre-defined communication architectures evaluated across various StarCraft II combat scenarios, with each scenario utilizing ten distinct architectures generated from different random seeds. The significant variance in performance metrics highlights the influence of communication architecture on the agents' effectiveness in these complex environments, emphasizing the necessity of searching for the optimal communication configuration.
  • Figure 2: The overview of our proposed CommFormer. CommFormer initiates by establishing the communication graph, which subsequently serves as both the masking and edge embeddings in the encoder and decoder to ensure that agents can exclusively access messages from communicated agents. Subsequently, the encoder and decoder modules come into play, processing a sequence of agents' observations and transforming them into a sequence of optimal actions. Additionally, CommFormer integrates a dynamic gating mechanism that determines when communication is necessary based on the current observations for each agent. For instance, at time step $t$, agents $i$ and $j$ require additional information, while agents $k$ and $l$ do not, resulting in the omission of corresponding edges to conserve resources.
  • Figure 3: Performance comparison on SMAC tasks 1c3s5z, 5m_vs_6m, and 25m with different sparsity $\mathcal{S}$. The first row shows results for CommFormer, while the second row presents results for CommFormer-dyn, which includes an additional dynamic gating mechanism. As the value of sparsity $\mathcal{S}$ increases, both CommFormer and CommFormer-dyn show improved performance across different environments, with this effect being particularly pronounced in scenarios involving a large number of agents.
  • Figure 4: Performance comparison on SMAC tasks with different manually pre-defined communication architectures. CommFormer consistently achieves optimal performance, which underscores its capability to autonomously search for the optimal communication architecture, highlighting its adaptability across various scenarios and tasks.
  • Figure 5: The searching process of CommFormer in the SMAC task 1c3s5z. In this representation, a white square corresponds to a value of 1, indicating the presence of an edge connection.
  • ...and 2 more figures