Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence

Simin Li; Jun Guo; Jingqiao Xiu; Yuwei Zheng; Pu Feng; Xin Yu; Aishan Liu; Yaodong Yang; Bo An; Wenjun Wu; Xianglong Liu

Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence

Simin Li, Jun Guo, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Xin Yu, Aishan Liu, Yaodong Yang, Bo An, Wenjun Wu, Xianglong Liu

TL;DR

The paper addresses the vulnerability of cooperative multi-agent reinforcement learning (c-MARL) to adversarial interventions. It introduces Adversarial Minority Influence (AMI), a black-box, policy-based attack that uses a unilateral influence term $I^\alpha_t$ and a Targeted Adversarial Oracle (TAO) to steer victims toward globally worst-case targets, even without access to victim parameters. AMI is validated on real-world robot swarms and simulated domains such as StarCraft II and MAMujoco, showing superior attack performance over baselines and revealing adaptive victim targeting behavior. The work provides both a practical attack toolkit for robustness testing and insights into defense design for c-MARL systems in risk-sensitive applications.

Abstract

This study probes the vulnerabilities of cooperative multi-agent reinforcement learning (c-MARL) under adversarial attacks, a critical determinant of c-MARL's worst-case performance prior to real-world implementation. Current observation-based attacks, constrained by white-box assumptions, overlook c-MARL's complex multi-agent interactions and cooperative objectives, resulting in impractical and limited attack capabilities. To address these shortcomes, we propose Adversarial Minority Influence (AMI), a practical and strong for c-MARL. AMI is a practical black-box attack and can be launched without knowing victim parameters. AMI is also strong by considering the complex multi-agent interaction and the cooperative goal of agents, enabling a single adversarial agent to unilaterally misleads majority victims to form targeted worst-case cooperation. This mirrors minority influence phenomena in social psychology. To achieve maximum deviation in victim policies under complex agent-wise interactions, our unilateral attack aims to characterize and maximize the impact of the adversary on the victims. This is achieved by adapting a unilateral agent-wise relation metric derived from mutual information, thereby mitigating the adverse effects of victim influence on the adversary. To lead the victims into a jointly detrimental scenario, our targeted attack deceives victims into a long-term, cooperatively harmful situation by guiding each victim towards a specific target, determined through a trial-and-error process executed by a reinforcement learning agent. Through AMI, we achieve the first successful attack against real-world robot swarms and effectively fool agents in simulated environments into collectively worst-case scenarios, including Starcraft II and Multi-agent Mujoco. The source code and demonstrations can be found at: https://github.com/DIG-Beihang/AMI.

Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence

TL;DR

and a Targeted Adversarial Oracle (TAO) to steer victims toward globally worst-case targets, even without access to victim parameters. AMI is validated on real-world robot swarms and simulated domains such as StarCraft II and MAMujoco, showing superior attack performance over baselines and revealing adaptive victim targeting behavior. The work provides both a practical attack toolkit for robustness testing and insights into defense design for c-MARL systems in risk-sensitive applications.

Abstract

Paper Structure (27 sections, 17 equations, 13 figures, 7 tables, 1 algorithm)

This paper contains 27 sections, 17 equations, 13 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Overview of Adversarial Attacks
RL Attacks by Observation Perturbation
RL Attacks by Adversarial Policy
Problem Formulation
Method
Unilateral Influence Filter
Targeted Adversarial Oracle
Overall Training
Experiments
Experimental Setup
Environments
Compared methods and evaluation metrics
AMI Attack in Real World
...and 12 more sections

Figures (13)

Figure 1: While observation-based attack requires white-box assess to victim and manipulates agent observation directly, our adversarial minority influence attack is a black-box, policy-based attack that leverages one minority attacker to unilaterally influence majority victims towards a jointly worst target.
Figure 2: Framework of AMI. Unilateral influence filter decompose mutual information into minority influence and majority influence terms, while keeping latter for asymmetric influence. Targeted adversarial oracle is an RL agent that generates a worst-case target for each victim. Attacking victims towards this target results in jointly worst cooperation.
Figure 3: Illustration of the robot and playground for our real-world multi-robot rendezvous environment.
Figure 4: Comparisons of AMI against baselines in simulation and real-world experiments.
Figure 5: Behaviors of robot swarms under our AMI attack, adversary indicated by red square. Our adversary is the only one to fool away an agent to group with our adversary.
...and 8 more figures

Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence

TL;DR

Abstract

Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence

Authors

TL;DR

Abstract

Table of Contents

Figures (13)