Enhancing the Robustness of QMIX against State-adversarial Attacks

Weiran Guo; Guanjun Liu; Ziyuan Zhou; Ling Wang; Jiacun Wang

Enhancing the Robustness of QMIX against State-adversarial Attacks

Weiran Guo, Guanjun Liu, Ziyuan Zhou, Ling Wang, Jiacun Wang

TL;DR

This work extends state-adversarial robustness from single-agent to cooperative multi-agent reinforcement learning by applying four SARL defenses to QMIX: gradient-based adversaries, policy regularization, alternating training with learned adversaries (ATLA), and policy adversarial actor director (PA-AD). It formalizes observation perturbations within a state-adversarial stochastic game and Dec-POMDP framework, and evaluates worst-case training and cross-attack testing in the StarCraft II SMAC environment. The study reveals trade-offs: gradient-based attacks are effective and easy to implement, policy regularization is simple but weaker under strong attacks, ATLA can encounter training instability due to expanded state-action spaces, and PA-AD often yields the best overall robustness with more stable training dynamics. These findings advance practical MARL robustness and point to future work on scalable adversarial training, hybrid defenses, and application to additional MARL algorithms.

Abstract

Deep reinforcement learning (DRL) performance is generally impacted by state-adversarial attacks, a perturbation applied to an agent's observation. Most recent research has concentrated on robust single-agent reinforcement learning (SARL) algorithms against state-adversarial attacks. Still, there has yet to be much work on robust multi-agent reinforcement learning. Using QMIX, one of the popular cooperative multi-agent reinforcement algorithms, as an example, we discuss four techniques to improve the robustness of SARL algorithms and extend them to multi-agent scenarios. To increase the robustness of multi-agent reinforcement learning (MARL) algorithms, we train models using a variety of attacks in this research. We then test the models taught using the other attacks by subjecting them to the corresponding attacks throughout the training phase. In this way, we organize and summarize techniques for enhancing robustness when used with MARL.

Enhancing the Robustness of QMIX against State-adversarial Attacks

TL;DR

Abstract

Paper Structure (19 sections, 7 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 7 equations, 4 figures, 1 table, 2 algorithms.

Introduction
Related Work
Adversarial Attack and Adversarial Training in SARL
Adversarial Attack in MARL
Adversarial Training in MARL
Background
c-MARL Algorithm: QMIX
State-adversarial Stochastic Game
Dec-POMDP with State Adversary
Methods
Gradient-based Adversary in MARL
Policy Regularization in MARL
ATLA in MARL
PA-AD in MARL
Experiments
...and 4 more sections

Figures (4)

Figure 1: The macro architecture of QMIX.
Figure 2: Gradient-based adversarial attacks in MARL. The state adversary uses a gradient-based optimization method.
Figure 3: ATLA attacks in MARL. The state adversary is a neural network.
Figure 4: PA-AD attacks in MARL. The state adversary actor is a neural network that generates the policy-perturbing direction $\hat{\textbf{a}}$. The state adversary actor produces an optimal attack according to $\hat{\textbf{a}}$.

Enhancing the Robustness of QMIX against State-adversarial Attacks

TL;DR

Abstract

Enhancing the Robustness of QMIX against State-adversarial Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)