Table of Contents
Fetching ...

A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning

Yinbo Yu, Saihao Yan, Jiajia Liu

TL;DR

The paper addresses the vulnerability of cooperative multi-agent DRL (c-MADRL) to backdoor attacks by embedding a stealthy backdoor in a single agent. It introduces a spatiotemporal trigger, defined as a trigger T := (Psi, zeta), that leverages sequences of observations to activate malicious actions, and pairs this with a reward-hacking strategy that reverses the backdoored agent’s reward during an attack window and applies a unilateral-influence term with a tunable parameter lambda. Formal threat modeling via a Dec-POMDP and empirical evaluation on VDN and QMIX in the SMAC environment demonstrate the approach can achieve high ASR (up to ~91%) while keeping clean performance variance low (CPVR ~3.7%), even when only one agent is compromised. The results underscore the practicality and stealth of the attack, highlighting the need for defense methods against single-agent backdoors in c-MADRL and motivating future work in black-box settings and robust defenses.

Abstract

Recent studies have shown that cooperative multi-agent deep reinforcement learning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor trigger is observed, it will perform abnormal actions leading to failures or malicious goals. However, existing proposed backdoors suffer from several issues, e.g., fixed visual trigger patterns lack stealthiness, the backdoor is trained or activated by an additional network, or all agents are backdoored. To this end, in this paper, we propose a novel backdoor attack against c-MADRL, which attacks the entire multi-agent team by embedding the backdoor only in a single agent. Firstly, we introduce adversary spatiotemporal behavior patterns as the backdoor trigger rather than manual-injected fixed visual patterns or instant status and control the attack duration. This method can guarantee the stealthiness and practicality of injected backdoors. Secondly, we hack the original reward function of the backdoored agent via reward reverse and unilateral guidance during training to ensure its adverse influence on the entire team. We evaluate our backdoor attacks on two classic c-MADRL algorithms VDN and QMIX, in a popular c-MADRL environment SMAC. The experimental results demonstrate that our backdoor attacks are able to reach a high attack success rate (91.6\%) while maintaining a low clean performance variance rate (3.7\%).

A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning

TL;DR

The paper addresses the vulnerability of cooperative multi-agent DRL (c-MADRL) to backdoor attacks by embedding a stealthy backdoor in a single agent. It introduces a spatiotemporal trigger, defined as a trigger T := (Psi, zeta), that leverages sequences of observations to activate malicious actions, and pairs this with a reward-hacking strategy that reverses the backdoored agent’s reward during an attack window and applies a unilateral-influence term with a tunable parameter lambda. Formal threat modeling via a Dec-POMDP and empirical evaluation on VDN and QMIX in the SMAC environment demonstrate the approach can achieve high ASR (up to ~91%) while keeping clean performance variance low (CPVR ~3.7%), even when only one agent is compromised. The results underscore the practicality and stealth of the attack, highlighting the need for defense methods against single-agent backdoors in c-MADRL and motivating future work in black-box settings and robust defenses.

Abstract

Recent studies have shown that cooperative multi-agent deep reinforcement learning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor trigger is observed, it will perform abnormal actions leading to failures or malicious goals. However, existing proposed backdoors suffer from several issues, e.g., fixed visual trigger patterns lack stealthiness, the backdoor is trained or activated by an additional network, or all agents are backdoored. To this end, in this paper, we propose a novel backdoor attack against c-MADRL, which attacks the entire multi-agent team by embedding the backdoor only in a single agent. Firstly, we introduce adversary spatiotemporal behavior patterns as the backdoor trigger rather than manual-injected fixed visual patterns or instant status and control the attack duration. This method can guarantee the stealthiness and practicality of injected backdoors. Secondly, we hack the original reward function of the backdoored agent via reward reverse and unilateral guidance during training to ensure its adverse influence on the entire team. We evaluate our backdoor attacks on two classic c-MADRL algorithms VDN and QMIX, in a popular c-MADRL environment SMAC. The experimental results demonstrate that our backdoor attacks are able to reach a high attack success rate (91.6\%) while maintaining a low clean performance variance rate (3.7\%).
Paper Structure (13 sections, 5 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 5 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: The framework of the proposed backdoor attack.
  • Figure 2: The episode rewards and winning rates of the backdoored models attacking VDN and QMIX.
  • Figure 3: The behaviors of all agents in an attack period. The trigger is shown in the green rectangle, the backdoored agent is shown in the red rectangle and affected clean agents are shown in yellow rectangles. Numbers represent the ID of clean agents.
  • Figure 4: Action distribution of all clean agents in a clean episode and a poisoned episode. The green rectangle represents the behavior of focused fire.