Table of Contents
Fetching ...

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun

TL;DR

This work investigates adversarial vulnerabilities in deep reinforcement learning by proposing two tactics: strategically-timed attacks that minimize perturbations while reducing rewards, and enchanting attacks that plan to drive the agent to a target state. The strategies combine a timing heuristic based on action-preference with a Carlini-Wagner perturbation, and a planning pipeline that uses video frame prediction with cross-entropy search to select action sequences. Empirical results on DQN and A3C across five Atari games show that strategically-timed attacks can match the impact of uniform attacks with only ~25% of perturbations, while enchanting attacks achieve >70% success in reaching target states, highlighting substantial robustness concerns. The paper also introduces a planning-based adversarial framework and discusses directions for defenses against such manipulations.

Abstract

We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples, namely the strategically-timed attack and the enchanting attack. In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the two tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Videos are available at http://yenchenlin.me/adversarial_attack_RL/

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

TL;DR

This work investigates adversarial vulnerabilities in deep reinforcement learning by proposing two tactics: strategically-timed attacks that minimize perturbations while reducing rewards, and enchanting attacks that plan to drive the agent to a target state. The strategies combine a timing heuristic based on action-preference with a Carlini-Wagner perturbation, and a planning pipeline that uses video frame prediction with cross-entropy search to select action sequences. Empirical results on DQN and A3C across five Atari games show that strategically-timed attacks can match the impact of uniform attacks with only ~25% of perturbations, while enchanting attacks achieve >70% success in reaching target states, highlighting substantial robustness concerns. The paper also introduces a planning-based adversarial framework and discusses directions for defenses against such manipulations.

Abstract

We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples, namely the strategically-timed attack and the enchanting attack. In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the two tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Videos are available at http://yenchenlin.me/adversarial_attack_RL/

Paper Structure

This paper contains 16 sections, 5 equations, 4 figures.

Figures (4)

  • Figure 1: Illustration of the strategically-timed attack on Pong. We use a function $c$ to compute the preference of the agent in taking the most preferred action over the least preferred action at the current state $s_t$. A large preference value implies an immediate reward. In the bottom panel, we plot $c(s_t)$. Our proposed strategically-timed attack launch an attack to a deep RL agent when the preference is greater than or equal to a threshold, $c(s_t)\ge\beta$ (red-dash line). When a small perturbation is added to the observation at $s_{84}$ (where $c(s_{84})\ge\beta$), the agent changes its action from up to down and eventually misses the ball. But when the perturbation is added to the observation at $s_{25}$ (where $c(s_{25})<\beta$), there is no impact to the reward.
  • Figure 2: Illustration of Enchanting Attack on Ms.Pacman. The blue panel on the right shows the flow of the attack starting at $s_t$: (1) action sequence planning, (2) crafting an adversarial example with a target-action, (3) the agent takes an action, and (4) environment generates the next state $s_{t+1}$. The green panel at the left depicts that the video prediction model is trained from unlabeled video. The white panel in the middle depicts the adversary starts at $s_t$ and utilize the prediction model to plan the attack.
  • Figure 3: Accumulated reward (y-axis) v.s. Portions of time steps the agent is attacked (x-axis) of Strategically-timed Attack in 5 games. The blue and green curves correspond to results of A3C and DQN, respectively. A larger reward means the deep RL agent is more robust to the strategically-timed attack.
  • Figure 4: Success rate (y-axis) v.s. $H$ steps in the future (x-axis) for Enchanting Attack in 5 games. The blue and green curves correspond to results of A3C and DQN, respectively. A lower rate means that the deep RL agent is more robust to the enchanting attack.