Table of Contents
Fetching ...

Neutral Agent-based Adversarial Policy Learning against Deep Reinforcement Learning in Multi-party Open Systems

Qizhou Peng, Yang Zheng, Yu Wen, Yanna Wu, Yingying Du

TL;DR

This work tackles the vulnerability of deep reinforcement learning in multi-party open systems by proposing neutral agent-based adversarial policy learning that does not require environment manipulation or direct victim interaction. It introduces a reward design grounded in victim failure paths and an estimation-based reward model using LSTM to operate under partial observability, all within a QMIX-based MARL framework that maximizes the total Q-value $Q^{tot}$. Empirically, the method generalizes across StarCraft II SMAC and Highway-env tasks, achieving substantial reductions in victim success and faster convergence than baselines; increasing adversarial density enhances effectiveness, and common defenses offer limited protection in open settings. The results underscore practical risks for open DRL systems and provide guidance for designing robust defenses against neutral-agent adversarial strategies.

Abstract

Reinforcement learning (RL) has been an important machine learning paradigm for solving long-horizon sequential decision-making problems under uncertainty. By integrating deep neural networks (DNNs) into the RL framework, deep reinforcement learning (DRL) has emerged, which achieved significant success in various domains. However, the integration of DNNs also makes it vulnerable to adversarial attacks. Existing adversarial attack techniques mainly focus on either directly manipulating the environment with which a victim agent interacts or deploying an adversarial agent that interacts with the victim agent to induce abnormal behaviors. While these techniques achieve promising results, their adoption in multi-party open systems remains limited due to two major reasons: impractical assumption of full control over the environment and dependent on interactions with victim agents. To enable adversarial attacks in multi-party open systems, in this paper, we redesigned an adversarial policy learning approach that can mislead well-trained victim agents without requiring direct interactions with these agents or full control over their environments. Particularly, we propose a neutral agent-based approach across various task scenarios in multi-party open systems. While the neutral agents seemingly are detached from the victim agents, indirectly influence them through the shared environment. We evaluate our proposed method on the SMAC platform based on Starcraft II and the autonomous driving simulation platform Highway-env. The experimental results demonstrate that our method can launch general and effective adversarial attacks in multi-party open systems.

Neutral Agent-based Adversarial Policy Learning against Deep Reinforcement Learning in Multi-party Open Systems

TL;DR

This work tackles the vulnerability of deep reinforcement learning in multi-party open systems by proposing neutral agent-based adversarial policy learning that does not require environment manipulation or direct victim interaction. It introduces a reward design grounded in victim failure paths and an estimation-based reward model using LSTM to operate under partial observability, all within a QMIX-based MARL framework that maximizes the total Q-value . Empirically, the method generalizes across StarCraft II SMAC and Highway-env tasks, achieving substantial reductions in victim success and faster convergence than baselines; increasing adversarial density enhances effectiveness, and common defenses offer limited protection in open settings. The results underscore practical risks for open DRL systems and provide guidance for designing robust defenses against neutral-agent adversarial strategies.

Abstract

Reinforcement learning (RL) has been an important machine learning paradigm for solving long-horizon sequential decision-making problems under uncertainty. By integrating deep neural networks (DNNs) into the RL framework, deep reinforcement learning (DRL) has emerged, which achieved significant success in various domains. However, the integration of DNNs also makes it vulnerable to adversarial attacks. Existing adversarial attack techniques mainly focus on either directly manipulating the environment with which a victim agent interacts or deploying an adversarial agent that interacts with the victim agent to induce abnormal behaviors. While these techniques achieve promising results, their adoption in multi-party open systems remains limited due to two major reasons: impractical assumption of full control over the environment and dependent on interactions with victim agents. To enable adversarial attacks in multi-party open systems, in this paper, we redesigned an adversarial policy learning approach that can mislead well-trained victim agents without requiring direct interactions with these agents or full control over their environments. Particularly, we propose a neutral agent-based approach across various task scenarios in multi-party open systems. While the neutral agents seemingly are detached from the victim agents, indirectly influence them through the shared environment. We evaluate our proposed method on the SMAC platform based on Starcraft II and the autonomous driving simulation platform Highway-env. The experimental results demonstrate that our method can launch general and effective adversarial attacks in multi-party open systems.

Paper Structure

This paper contains 38 sections, 4 theorems, 26 equations, 7 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

In a multi-party open system, if all agents follow fixed policies except agents of one specific party, the state transition of the environment system will depend only upon the joint policy of agents belonged to this specific party rather than the joint policy of all agents in the system.

Figures (7)

  • Figure 1: Categories of Reinforcement Learning Task Environments
  • Figure 2: The algorithm framework of our proposed method
  • Figure 3: Estimation-based reward model framework
  • Figure 4: Comparison of wining rates trend during training adversarial agents across different reward model in Starcraft II maps
  • Figure 5: Possible failure paths of autonomous driving task: collision occured, unreach destination before time limitation, and disobey the traffic rule (drive against the traffic flow).
  • ...and 2 more figures

Theorems & Definitions (6)

  • Proposition 1
  • Proposition 2
  • Proposition 1
  • proof
  • Proposition 2
  • proof