Table of Contents
Fetching ...

Advantage-based Temporal Attack in Reinforcement Learning

Shenghong He

TL;DR

A novel method called Advantage-based Adversarial Transformer (AAT), which can generate adversarial examples with stronger temporal correlations to improve the attack performance and introduces a weighted advantage mechanism, which quantifies the effectiveness of a perturbation in a given state.

Abstract

Extensive research demonstrates that Deep Reinforcement Learning (DRL) models are susceptible to adversarially constructed inputs (i.e., adversarial examples), which can mislead the agent to take suboptimal or unsafe actions. Recent methods improve attack effectiveness by leveraging future rewards to guide adversarial perturbation generation over sequential time steps (i.e., reward-based attacks). However, these methods are unable to capture dependencies between different time steps in the perturbation generation process, resulting in a weak temporal correlation between the current perturbation and previous perturbations.In this paper, we propose a novel method called Advantage-based Adversarial Transformer (AAT), which can generate adversarial examples with stronger temporal correlations (i.e., time-correlated adversarial examples) to improve the attack performance. AAT employs a multi-scale causal self-attention (MSCSA) mechanism to dynamically capture dependencies between historical information from different time periods and the current state, thus enhancing the correlation between the current perturbation and the previous perturbation. Moreover, AAT introduces a weighted advantage mechanism, which quantifies the effectiveness of a perturbation in a given state and guides the generation process toward high-performance adversarial examples by sampling high-advantage regions. Extensive experiments demonstrate that the performance of AAT matches or surpasses mainstream adversarial attack baselines on Atari, DeepMind Control Suite and Google football tasks.

Advantage-based Temporal Attack in Reinforcement Learning

TL;DR

A novel method called Advantage-based Adversarial Transformer (AAT), which can generate adversarial examples with stronger temporal correlations to improve the attack performance and introduces a weighted advantage mechanism, which quantifies the effectiveness of a perturbation in a given state.

Abstract

Extensive research demonstrates that Deep Reinforcement Learning (DRL) models are susceptible to adversarially constructed inputs (i.e., adversarial examples), which can mislead the agent to take suboptimal or unsafe actions. Recent methods improve attack effectiveness by leveraging future rewards to guide adversarial perturbation generation over sequential time steps (i.e., reward-based attacks). However, these methods are unable to capture dependencies between different time steps in the perturbation generation process, resulting in a weak temporal correlation between the current perturbation and previous perturbations.In this paper, we propose a novel method called Advantage-based Adversarial Transformer (AAT), which can generate adversarial examples with stronger temporal correlations (i.e., time-correlated adversarial examples) to improve the attack performance. AAT employs a multi-scale causal self-attention (MSCSA) mechanism to dynamically capture dependencies between historical information from different time periods and the current state, thus enhancing the correlation between the current perturbation and the previous perturbation. Moreover, AAT introduces a weighted advantage mechanism, which quantifies the effectiveness of a perturbation in a given state and guides the generation process toward high-performance adversarial examples by sampling high-advantage regions. Extensive experiments demonstrate that the performance of AAT matches or surpasses mainstream adversarial attack baselines on Atari, DeepMind Control Suite and Google football tasks.
Paper Structure (34 sections, 2 theorems, 20 equations, 17 figures, 18 tables, 1 algorithm)

This paper contains 34 sections, 2 theorems, 20 equations, 17 figures, 18 tables, 1 algorithm.

Key Result

Lemma 1

(The Advantage Performance Difference LemmaNIPS2016_cc7e2b87kakade2002approximately) For any attack policies $\pi$ and $\pi'$, we have $V^\pi(s)-V^{\pi'}(s)=\sum_{t=0}^\infty \gamma^t \mathbb{E}_{\tau \sim \pi}\hat{A}^{\pi'}(s,\delta)$, where $\hat{A}^{\pi'}$ is the advantage function under policy $

Figures (17)

  • Figure 1: The AAT training structure. $\oplus$ denotes the concatenation of vectors, $\otimes$ denotes the element-wise multiplication of vectors and MLP is a multilayer perception machine. During the training phase, AAT generates adversarial perturbations using the weighted advantage calculated from the $Q_\theta$ and $V_\psi$ value functions.
  • Figure 2: All experimental results are the average of 10 experiments. In the black-box experimental results, we generate adversarial examples using A3C and D4PG as substitution strategies, respectively.
  • Figure 3: The performance of different attack methods in Gfootball. The expected cumulative rewards achieved by DQN and PPO target policies are 7.12 and 7.88 respectively.
  • Figure 4: Visualization of immediate rewards for target policy. The x-axis represents the moments from the start to the end of the target policy's execution in the Gfootball environment, and the y-axis is the reward value obtained at each time step.
  • Figure 5: The impact of data scale on AAT. The x-axis represents the scale of the data. For example, 5 represents 5,000 historical trajectories.
  • ...and 12 more figures

Theorems & Definitions (3)

  • Lemma 1
  • Theorem 1
  • Proof 1