Table of Contents
Fetching ...

Robust Multi-agent Communication via Multi-view Message Certification

Lei Yuan, Tao Jiang, Lihe Li, Feng Chen, Zongzhang Zhang, Yang Yu

TL;DR

<p>This paper tackles robustness in cooperative MARL by addressing vulnerabilities in inter-agent communication. It introduces CroMAC, a framework that treats messages as multiple views of the state, fusing them with a multi-view variational autoencoder (MVAE) that uses a product-of-experts to form a joint message representation. It then derives certificates between the joint representation and individual messages through interval bound propagation to bound Q-values under worst-case perturbations, and trains the system with a robustness objective under a centralized training, decentralized execution paradigm. Empirical results on Hallway, Level-Based Foraging, Traffic Junction, and SMAC maps show CroMAC achieves, and often surpasses, existing baselines under various perturbation regimes, demonstrating strong robustness and generality across MARL settings. The work advances practical, certifiable robustness for multi-agent communication, with implications for deployment in real-world, noisy environments.</p>

Abstract

Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment. Major relevant works tackle this issue under specific assumptions, like a limited number of message channels would sustain perturbations, limiting the efficiency in complex scenarios. In this paper, we take a further step addressing this issue by learning a robust multi-agent communication policy via multi-view message certification, dubbed CroMAC. Agents trained under CroMAC can obtain guaranteed lower bounds on state-action values to identify and choose the optimal action under a worst-case deviation when the received messages are perturbed. Concretely, we first model multi-agent communication as a multi-view problem, where every message stands for a view of the state. Then we extract a certificated joint message representation by a multi-view variational autoencoder (MVAE) that uses a product-of-experts inference network. For the optimization phase, we do perturbations in the latent space of the state for a certificate guarantee. Then the learned joint message representation is used to approximate the certificated state representation during training. Extensive experiments in several cooperative multi-agent benchmarks validate the effectiveness of the proposed CroMAC.

Robust Multi-agent Communication via Multi-view Message Certification

TL;DR

<p>This paper tackles robustness in cooperative MARL by addressing vulnerabilities in inter-agent communication. It introduces CroMAC, a framework that treats messages as multiple views of the state, fusing them with a multi-view variational autoencoder (MVAE) that uses a product-of-experts to form a joint message representation. It then derives certificates between the joint representation and individual messages through interval bound propagation to bound Q-values under worst-case perturbations, and trains the system with a robustness objective under a centralized training, decentralized execution paradigm. Empirical results on Hallway, Level-Based Foraging, Traffic Junction, and SMAC maps show CroMAC achieves, and often surpasses, existing baselines under various perturbation regimes, demonstrating strong robustness and generality across MARL settings. The work advances practical, certifiable robustness for multi-agent communication, with implications for deployment in real-world, noisy environments.</p>

Abstract

Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment. Major relevant works tackle this issue under specific assumptions, like a limited number of message channels would sustain perturbations, limiting the efficiency in complex scenarios. In this paper, we take a further step addressing this issue by learning a robust multi-agent communication policy via multi-view message certification, dubbed CroMAC. Agents trained under CroMAC can obtain guaranteed lower bounds on state-action values to identify and choose the optimal action under a worst-case deviation when the received messages are perturbed. Concretely, we first model multi-agent communication as a multi-view problem, where every message stands for a view of the state. Then we extract a certificated joint message representation by a multi-view variational autoencoder (MVAE) that uses a product-of-experts inference network. For the optimization phase, we do perturbations in the latent space of the state for a certificate guarantee. Then the learned joint message representation is used to approximate the certificated state representation during training. Extensive experiments in several cooperative multi-agent benchmarks validate the effectiveness of the proposed CroMAC.
Paper Structure (21 sections, 21 equations, 6 figures, 2 tables, 3 algorithms)

This paper contains 21 sections, 21 equations, 6 figures, 2 tables, 3 algorithms.

Figures (6)

  • Figure 1: Structure of CroMAC. (a) During the training phase, we encode the state into latent variables $z_{\rm st}$, then perturb it to gain a certificate guarantee between $z_{\rm st}$ and $Q_i(\tau_i,z_{\rm st}\pm \kappa \mathbb{\epsilon}; a_i)$, and this process is optimized via minimizing the overlap between the output bounds of action values to get a large difference in the outcome. The whole process can be optimized by any value decomposition methods like QMIX qmix, and the output of the message aggregation module $z_{\rm msg}$ is then used to approximate $z_{\rm st}$ by minimizing their distance (e.g., KL divergence). (b) The message aggregation module. Each message $m_{ij}$ is encoded into a latent space via a message encoder $E_j$, where $j\in \{1, \cdots, i - 1, i + 1, \cdots, N\}$, and the parameters of $E_j$ are regularized to obtain certificates between the joint message representation and each message. (c) After training, we use the learned message aggregation module and other shared modules like trajectory encoder to make a decision in a decentralized way.
  • Figure 2: Multiple benchmarks used in our experiments.
  • Figure 3: Empirical Results of several algorithms tested in two different perturbation conditions on benchmarks. Note that Full-Comm, CroMAC w/o adv, and QMIX are tested in perturbation-free conditions, while CroMAC, CroMAC w/o robust, and AME suffer from message perturbations when testing. See text for more details.
  • Figure 4: Visualization results. We take $t=3$ and $4$ in Hallway as shown in (a) and (b), where Agent 1 and Agent 2 stand one step from the goal while Agent 3 needs to take two steps to reach the goal. (c) and (d) show the PCA projection tipping1999probabilistic of the message representation $z_{\rm msg}$ for (a) and (b), with $\bullet$ and $\star$ represent $z_{\rm msg}$ with and without perturbations, respectively. Note that $z_{\rm msg}$ is the same for agents without perturbations, and some $\bullet$ are darker because multiple ones overlap together. $\blacktriangle,\blacktriangledown$ represents the upper and lower bounds of $z_{\rm msg}$, note that ellipses of the same color represent the same time step. (e) and (f) display the $Q$-values (multiplied by 100 for viewing) of each agent accordingly, where the first row means the original $Q$-values of all actions, while the second row refers to them under perturbations with red fonts representing the selected actions in corresponding cases. The third and fourth rows show the upper and lower bounds of $Q$-values under $\mathbb{\epsilon}$-perturbation, where yellow squares are the lower bounds of $Q$-values over best actions while blue squares are the upper bounds of $Q$-values over other actions.
  • Figure 5: (a) Average test success rates of CroMAC implemented with different value based MARL methods. (b) Performance comparison with varying sights, where $sn$ means the sight range is $n$ and the default sight range is 9.
  • ...and 1 more figures

Theorems & Definitions (1)

  • proof