What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?
Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, Shaofeng Zou, Fei Miao
TL;DR
The paper tackles the vulnerability of multi-agent reinforcement learning policies to adversarial state perturbations. It introduces State-Adversarial Markov Games (SAMG) and proves that traditional solution concepts like optimal agent policy and robust Nash equilibrium may fail to exist under state perturbations; it then defines a robust agent policy that maximizes the worst-case expected state value and proves its existence for finite SAMGs. To solve SAMGs, the authors propose the Robust Multi-Agent Adversarial Actor-Critic (RMA3C), a gradient-descent-ascent based algorithm with a centralized critic and per-agent actor and adversary networks, optimized to solve the maximin objective. Empirical results in cooperative and mixed environments (CN, ET, KA, PD) show RMA3C delivers substantially higher robustness and mean rewards than baselines under both random and adversarial perturbations, including scenarios with more agents and larger perturbation budgets. Overall, the work provides a principled framework and scalable algorithm for robust MARL in the presence of state perturbations, with practical implications for real-world multi-robot and autonomous systems.
Abstract
Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.
