Table of Contents
Fetching ...

Crafting Adversarial Input Sequences for Recurrent Neural Networks

Nicolas Papernot, Patrick McDaniel, Ananthram Swami, Richard Harang

TL;DR

The paper addresses the vulnerability of recurrent neural networks to adversarial input sequences by extending adversarial-sample techniques from static inputs to sequential data. It introduces adaptations of the fast gradient sign method and the forward-derivative (Jacobian) approach, using computational graph unfolding to compute sensitivities in cyclical RNNs. Through experiments on an LSTM-based sentiment classifier and a synthetic sequence-to-sequence model, it demonstrates that carefully crafted perturbations—averaging around 9 word changes in long reviews—can force incorrect categorical and sequential outputs at test time without retraining. The work highlights practical security implications for RNN deployments and outlines avenues for broader data types, black-box settings, and defenses.

Abstract

Machine learning models are frequently used to solve complex security problems, as well as to make decisions in sensitive situations like guiding autonomous vehicles or predicting financial market behaviors. Previous efforts have shown that numerous machine learning models were vulnerable to adversarial manipulations of their inputs taking the form of adversarial samples. Such inputs are crafted by adding carefully selected perturbations to legitimate inputs so as to force the machine learning model to misbehave, for instance by outputting a wrong class if the machine learning task of interest is classification. In fact, to the best of our knowledge, all previous work on adversarial samples crafting for neural network considered models used to solve classification tasks, most frequently in computer vision applications. In this paper, we contribute to the field of adversarial machine learning by investigating adversarial input sequences for recurrent neural networks processing sequential data. We show that the classes of algorithms introduced previously to craft adversarial samples misclassified by feed-forward neural networks can be adapted to recurrent neural networks. In a experiment, we show that adversaries can craft adversarial sequences misleading both categorical and sequential recurrent neural networks.

Crafting Adversarial Input Sequences for Recurrent Neural Networks

TL;DR

The paper addresses the vulnerability of recurrent neural networks to adversarial input sequences by extending adversarial-sample techniques from static inputs to sequential data. It introduces adaptations of the fast gradient sign method and the forward-derivative (Jacobian) approach, using computational graph unfolding to compute sensitivities in cyclical RNNs. Through experiments on an LSTM-based sentiment classifier and a synthetic sequence-to-sequence model, it demonstrates that carefully crafted perturbations—averaging around 9 word changes in long reviews—can force incorrect categorical and sequential outputs at test time without retraining. The work highlights practical security implications for RNN deployments and outlines avenues for broader data types, black-box settings, and defenses.

Abstract

Machine learning models are frequently used to solve complex security problems, as well as to make decisions in sensitive situations like guiding autonomous vehicles or predicting financial market behaviors. Previous efforts have shown that numerous machine learning models were vulnerable to adversarial manipulations of their inputs taking the form of adversarial samples. Such inputs are crafted by adding carefully selected perturbations to legitimate inputs so as to force the machine learning model to misbehave, for instance by outputting a wrong class if the machine learning task of interest is classification. In fact, to the best of our knowledge, all previous work on adversarial samples crafting for neural network considered models used to solve classification tasks, most frequently in computer vision applications. In this paper, we contribute to the field of adversarial machine learning by investigating adversarial input sequences for recurrent neural networks processing sequential data. We show that the classes of algorithms introduced previously to craft adversarial samples misclassified by feed-forward neural networks can be adapted to recurrent neural networks. In a experiment, we show that adversaries can craft adversarial sequences misleading both categorical and sequential recurrent neural networks.

Paper Structure

This paper contains 11 sections, 10 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Recurrent Neural Network: the sequential input $\vec{x}$ is processed by time step value $x^{(t)}$. The hidden neuron evaluates its state $h^{(t)}$ at time step $t$ by adding (1) the result of multiplying the current input value $x^{(t)}$ with weight $\vec{w}_{in}$, with (2) the result of multiplying its previous state with weight $\vec{w}$, and (3) the bias $b_h$, and finally applying the hyperbolic tangent. The output $y^{(t)}$ multiplies the hidden neuron state by weight $\vec{w}_{out}$ and adds bias $b_y$.
  • Figure 2: Unfolded Recurrent Neural Network: this neural network is identical to the one depicted in Figure \ref{['fig:rnn']}, with the exception of its recurrence cycle, which is now unfolded. Biases are omitted for clarity of the illustration.
  • Figure 3: LSTM-based RNN: this recurrent model classifies movie reviews.
  • Figure 4: Example input and output sequences of our experimental setup In the input graph, the solid lines indicate the legitimate input sequence while the dashed lines indicate the crafted adversarial sequence. In the output, solid lines indicate the training target output, dotted lines indicated the model predictions and dashed lines the prediction the model made on the adversarial sequence.