What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Songyang Han; Sanbao Su; Sihong He; Shuo Han; Haizhao Yang; Shaofeng Zou; Fei Miao

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, Shaofeng Zou, Fei Miao

TL;DR

The paper tackles the vulnerability of multi-agent reinforcement learning policies to adversarial state perturbations. It introduces State-Adversarial Markov Games (SAMG) and proves that traditional solution concepts like optimal agent policy and robust Nash equilibrium may fail to exist under state perturbations; it then defines a robust agent policy that maximizes the worst-case expected state value and proves its existence for finite SAMGs. To solve SAMGs, the authors propose the Robust Multi-Agent Adversarial Actor-Critic (RMA3C), a gradient-descent-ascent based algorithm with a centralized critic and per-agent actor and adversary networks, optimized to solve the maximin objective. Empirical results in cooperative and mixed environments (CN, ET, KA, PD) show RMA3C delivers substantially higher robustness and mean rewards than baselines under both random and adversarial perturbations, including scenarios with more agents and larger perturbation budgets. Overall, the work provides a principled framework and scalable algorithm for robust MARL in the presence of state perturbations, with practical implications for real-world multi-robot and autonomous systems.

Abstract

Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

TL;DR

Abstract

Paper Structure (52 sections, 16 theorems, 68 equations, 12 figures, 5 tables, 1 algorithm)

This paper contains 52 sections, 16 theorems, 68 equations, 12 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Multi-Agent Reinforcement Learning (MARL)
Robust Reinforcement Learning
State-Adversarial Markov Game (SAMG)
Solution Concepts
Optimal Adversary Policy
State-robust Totally Optimal Agent Policy
Robust Total Nash Equilibrium
Robust Agent Policy
Multi-Agent Adversarial Actor-Critic (RMA3C) Algorithm
Experiments
Baselines
Comparison Results
Training Comparison Under different Perturbations
...and 37 more sections

Key Result

Proposition 3.2

When the adversary policy is a fixed policy, the SAMG problem becomes a Dec-POMDP oliehoek2016concise.

Figures (12)

Figure 1: The agents' goal is to occupy and cover all landmarks, requiring cooperation to decide which landmark to cover. Figure a) illustrates the optimal target landmark for each agent without state perturbation. However, in figure b), an adversary perturbs the state observation of agents, causing agents to head in the wrong direction and leaving landmark 1 as uncovered. Our work demonstrates that traditional agent policies can be easily corrupted by adversarial state perturbations. To counter this, we propose a robust agent policy that maximizes average performance under worst-case state perturbations.
Figure 2: Comparison between Dec-POMDP and SAMG. In Dec-POMDP, the observation probability function is fixed, and it will not change according to the change of the agent policy. However, in SAMG the adversary policy is not a fixed policy, it may change according to the agents' policies and always select the worst-case state perturbation for agents. In SAMG, each agent is associated with an adversary to perturb its knowledge or observation of the true state. Agents want to find a policy $\pi$ to maximize their total expected return while adversaries want to find a policy $\chi$ to minimize agents' total expected return.
Figure 3: Solution concepts for the SAMGs. We first examine the widely used concepts (optimal agent policy and robust Nash Equilibrium) and demonstrate their non-existence under adversarial state perturbations. In response, we consider a new objective, the worst-case expected state value, and a new solution concept, the robust agent policy.
Figure 4: Our RMA3C algorithm compared with several baseline algorithms in training. The results show that our RMA3C algorithm outperforms the baselines, achieving higher mean episode rewards and greater robustness to state perturbations. The baselines were trained under either random state perturbations or a well-trained adversary policy $\chi^*$ (adversaries that are trained for the maximum training episodes in RMA3C). Overall, our RMA3C algorithm achieved up to 58.46% higher mean episode rewards than the baselines.
Figure 5: \ref{['fig:moreagents']}: Our RMA3C algorithm continues to achieve higher mean episode rewards, even with an increasing number of agents in the environment. \ref{['fig:s3_d']}:Our RMA3C algorithm is trained in the cooperative navigation environment with different perturbation budgets $d$. When $d$ increases, adversaries get more advantage, and may further decrease agents' total expected return.
...and 7 more figures

Theorems & Definitions (38)

Definition 3.1: Admissible Perturbed State Set
Proposition 3.2
Proposition 3.3
Proposition 4.1: Existence of Optimal Adversary Policy.
Definition 4.2: State-robust Totally Optimal Agent Policy
Theorem 4.3: Non-existence of State-robust Totally Optimal Agent Policy
Definition 4.4: Robust state value function
Theorem 4.5: Existence of Unique Robust State Value Function
Definition 4.6: Robus Total Nash Equilibrium
Theorem 4.7: Non-existence of Robust Total Nash Equilibrium
...and 28 more

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

TL;DR

Abstract

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (38)