Crafting Adversarial Input Sequences for Recurrent Neural Networks
Nicolas Papernot, Patrick McDaniel, Ananthram Swami, Richard Harang
TL;DR
The paper addresses the vulnerability of recurrent neural networks to adversarial input sequences by extending adversarial-sample techniques from static inputs to sequential data. It introduces adaptations of the fast gradient sign method and the forward-derivative (Jacobian) approach, using computational graph unfolding to compute sensitivities in cyclical RNNs. Through experiments on an LSTM-based sentiment classifier and a synthetic sequence-to-sequence model, it demonstrates that carefully crafted perturbations—averaging around 9 word changes in long reviews—can force incorrect categorical and sequential outputs at test time without retraining. The work highlights practical security implications for RNN deployments and outlines avenues for broader data types, black-box settings, and defenses.
Abstract
Machine learning models are frequently used to solve complex security problems, as well as to make decisions in sensitive situations like guiding autonomous vehicles or predicting financial market behaviors. Previous efforts have shown that numerous machine learning models were vulnerable to adversarial manipulations of their inputs taking the form of adversarial samples. Such inputs are crafted by adding carefully selected perturbations to legitimate inputs so as to force the machine learning model to misbehave, for instance by outputting a wrong class if the machine learning task of interest is classification. In fact, to the best of our knowledge, all previous work on adversarial samples crafting for neural network considered models used to solve classification tasks, most frequently in computer vision applications. In this paper, we contribute to the field of adversarial machine learning by investigating adversarial input sequences for recurrent neural networks processing sequential data. We show that the classes of algorithms introduced previously to craft adversarial samples misclassified by feed-forward neural networks can be adapted to recurrent neural networks. In a experiment, we show that adversaries can craft adversarial sequences misleading both categorical and sequential recurrent neural networks.
