Table of Contents
Fetching ...

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game

Simin Li, Jun Guo, Jingqiao Xiu, Ruixiao Xu, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, Xianglong Liu

TL;DR

This study explores the robustness of cooperative multi-agent reinforcement learning (c-MARL) against Byzantine failures and proposes a Bayesian Adversarial Robust Dec-POMDP framework, which views Byzantine adversaries as nature-dictated types, represented by a separate transition.

Abstract

In this study, we explore the robustness of cooperative multi-agent reinforcement learning (c-MARL) against Byzantine failures, where any agent can enact arbitrary, worst-case actions due to malfunction or adversarial attack. To address the uncertainty that any agent can be adversarial, we propose a Bayesian Adversarial Robust Dec-POMDP (BARDec-POMDP) framework, which views Byzantine adversaries as nature-dictated types, represented by a separate transition. This allows agents to learn policies grounded on their posterior beliefs about the type of other agents, fostering collaboration with identified allies and minimizing vulnerability to adversarial manipulation. We define the optimal solution to the BARDec-POMDP as an ex post robust Bayesian Markov perfect equilibrium, which we proof to exist and weakly dominates the equilibrium of previous robust MARL approaches. To realize this equilibrium, we put forward a two-timescale actor-critic algorithm with almost sure convergence under specific conditions. Experimentation on matrix games, level-based foraging and StarCraft II indicate that, even under worst-case perturbations, our method successfully acquires intricate micromanagement skills and adaptively aligns with allies, demonstrating resilience against non-oblivious adversaries, random allies, observation-based attacks, and transfer-based attacks.

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game

TL;DR

This study explores the robustness of cooperative multi-agent reinforcement learning (c-MARL) against Byzantine failures and proposes a Bayesian Adversarial Robust Dec-POMDP framework, which views Byzantine adversaries as nature-dictated types, represented by a separate transition.

Abstract

In this study, we explore the robustness of cooperative multi-agent reinforcement learning (c-MARL) against Byzantine failures, where any agent can enact arbitrary, worst-case actions due to malfunction or adversarial attack. To address the uncertainty that any agent can be adversarial, we propose a Bayesian Adversarial Robust Dec-POMDP (BARDec-POMDP) framework, which views Byzantine adversaries as nature-dictated types, represented by a separate transition. This allows agents to learn policies grounded on their posterior beliefs about the type of other agents, fostering collaboration with identified allies and minimizing vulnerability to adversarial manipulation. We define the optimal solution to the BARDec-POMDP as an ex post robust Bayesian Markov perfect equilibrium, which we proof to exist and weakly dominates the equilibrium of previous robust MARL approaches. To realize this equilibrium, we put forward a two-timescale actor-critic algorithm with almost sure convergence under specific conditions. Experimentation on matrix games, level-based foraging and StarCraft II indicate that, even under worst-case perturbations, our method successfully acquires intricate micromanagement skills and adaptively aligns with allies, demonstrating resilience against non-oblivious adversaries, random allies, observation-based attacks, and transfer-based attacks.
Paper Structure (33 sections, 12 theorems, 33 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 33 sections, 12 theorems, 33 equations, 7 figures, 9 tables, 1 algorithm.

Key Result

Proposition 2.1

For any robust c-MARL with fixed agent policy, a worst-case (i.e., most harmful) adversary exists.

Figures (7)

  • Figure 1: Framework of c-MARL with Byzantine adversaries. The action taken by agents with $\theta^i=1$ are replaced by the adversary policy $\hat{\pi}^i$.
  • Figure 2: ex ante RMPBE obscures differences between each type by taking expectation, while our ex interim RMPBE adapts to current type.
  • Figure 3: Environments used in our experiments. The toy iterative matrix game is proposed by han2022staterobustmarl. We use map 12x12-4p-3f-c for LBF and map 4m vs 3m for SMAC.
  • Figure 4: Cooperative and robust performance on three c-MARL environments. EIR-MAPPO achieves higher robust performance against non-oblivious adversaries and have cooperative performance on par with baselines. Reported on 5 seeds for cooperation and $5 \times N$ attacks.
  • Figure 5: Agent behaviors under attack. Red square indicates the adversary agent. Existing methods are either swayed, having unfocused fire or perform bad kiting. In contrast, our EIR-MAPPO learns kiting and focused fire simultaneously, under the presence of a worst-case adversary.
  • ...and 2 more figures

Theorems & Definitions (15)

  • Proposition 2.1: Existence of worst-case adversary
  • Definition 2.1: ex ante robustness
  • Definition 2.2: ex interim robustness
  • Proposition 2.2: Existence of RMPBE
  • Proposition 2.3
  • Definition 3.1
  • Proposition 3.1: Convergence
  • Theorem 3.1
  • Theorem \AlphAlph1.1: Kakutani's fixed point theorem
  • Theorem \AlphAlph1.2: kardecs2011discounted
  • ...and 5 more