Table of Contents
Fetching ...

Enhancing the Robustness of QMIX against State-adversarial Attacks

Weiran Guo, Guanjun Liu, Ziyuan Zhou, Ling Wang, Jiacun Wang

TL;DR

This work extends state-adversarial robustness from single-agent to cooperative multi-agent reinforcement learning by applying four SARL defenses to QMIX: gradient-based adversaries, policy regularization, alternating training with learned adversaries (ATLA), and policy adversarial actor director (PA-AD). It formalizes observation perturbations within a state-adversarial stochastic game and Dec-POMDP framework, and evaluates worst-case training and cross-attack testing in the StarCraft II SMAC environment. The study reveals trade-offs: gradient-based attacks are effective and easy to implement, policy regularization is simple but weaker under strong attacks, ATLA can encounter training instability due to expanded state-action spaces, and PA-AD often yields the best overall robustness with more stable training dynamics. These findings advance practical MARL robustness and point to future work on scalable adversarial training, hybrid defenses, and application to additional MARL algorithms.

Abstract

Deep reinforcement learning (DRL) performance is generally impacted by state-adversarial attacks, a perturbation applied to an agent's observation. Most recent research has concentrated on robust single-agent reinforcement learning (SARL) algorithms against state-adversarial attacks. Still, there has yet to be much work on robust multi-agent reinforcement learning. Using QMIX, one of the popular cooperative multi-agent reinforcement algorithms, as an example, we discuss four techniques to improve the robustness of SARL algorithms and extend them to multi-agent scenarios. To increase the robustness of multi-agent reinforcement learning (MARL) algorithms, we train models using a variety of attacks in this research. We then test the models taught using the other attacks by subjecting them to the corresponding attacks throughout the training phase. In this way, we organize and summarize techniques for enhancing robustness when used with MARL.

Enhancing the Robustness of QMIX against State-adversarial Attacks

TL;DR

This work extends state-adversarial robustness from single-agent to cooperative multi-agent reinforcement learning by applying four SARL defenses to QMIX: gradient-based adversaries, policy regularization, alternating training with learned adversaries (ATLA), and policy adversarial actor director (PA-AD). It formalizes observation perturbations within a state-adversarial stochastic game and Dec-POMDP framework, and evaluates worst-case training and cross-attack testing in the StarCraft II SMAC environment. The study reveals trade-offs: gradient-based attacks are effective and easy to implement, policy regularization is simple but weaker under strong attacks, ATLA can encounter training instability due to expanded state-action spaces, and PA-AD often yields the best overall robustness with more stable training dynamics. These findings advance practical MARL robustness and point to future work on scalable adversarial training, hybrid defenses, and application to additional MARL algorithms.

Abstract

Deep reinforcement learning (DRL) performance is generally impacted by state-adversarial attacks, a perturbation applied to an agent's observation. Most recent research has concentrated on robust single-agent reinforcement learning (SARL) algorithms against state-adversarial attacks. Still, there has yet to be much work on robust multi-agent reinforcement learning. Using QMIX, one of the popular cooperative multi-agent reinforcement algorithms, as an example, we discuss four techniques to improve the robustness of SARL algorithms and extend them to multi-agent scenarios. To increase the robustness of multi-agent reinforcement learning (MARL) algorithms, we train models using a variety of attacks in this research. We then test the models taught using the other attacks by subjecting them to the corresponding attacks throughout the training phase. In this way, we organize and summarize techniques for enhancing robustness when used with MARL.
Paper Structure (19 sections, 7 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 7 equations, 4 figures, 1 table, 2 algorithms.

Figures (4)

  • Figure 1: The macro architecture of QMIX.
  • Figure 2: Gradient-based adversarial attacks in MARL. The state adversary uses a gradient-based optimization method.
  • Figure 3: ATLA attacks in MARL. The state adversary is a neural network.
  • Figure 4: PA-AD attacks in MARL. The state adversary actor is a neural network that generates the policy-perturbing direction $\hat{\textbf{a}}$. The state adversary actor produces an optimal attack according to $\hat{\textbf{a}}$.