Adversarial Transformation Networks: Learning to Generate Adversarial Examples

Shumeet Baluja; Ian Fischer

Adversarial Transformation Networks: Learning to Generate Adversarial Examples

Shumeet Baluja, Ian Fischer

TL;DR

<3-5 sentence high-level summary> Adversarial Transformation Networks (ATNs) offer a fast, self-supervised framework for generating targeted adversarial examples by training a separate network to transform inputs into misclassifications against a fixed target model. The approach introduces two ATN variants—Perturbation ATN (P-ATN) and Adversarial Autoencoding ATN (AAE-ATN)—and leverages a reranking-based loss to enforce targeted outputs while preserving the relative order of other class predictions. Extensive experiments on MNIST and ImageNet (Inception-ResNet-v2) demonstrate substantial, architecture-dependent trade-offs between perturbation locality and adversarial diversity, as well as interesting transfer and insider-information effects. The work suggests ATNs’ potential for robust adversarial training, defense research, and deeper insights into how classifiers encode target concepts, while outlining directions for future enhancements and black-box extensions.

Abstract

Multiple different approaches of generating adversarial examples have been proposed to attack deep neural networks. These approaches involve either directly computing gradients with respect to the image pixels, or directly solving an optimization on the image pixels. In this work, we present a fundamentally new method for generating adversarial examples that is fast to execute and provides exceptional diversity of output. We efficiently train feed-forward neural networks in a self-supervised manner to generate adversarial examples against a target network or set of networks. We call such a network an Adversarial Transformation Network (ATN). ATNs are trained to generate adversarial examples that minimally modify the classifier's outputs given the original input, while constraining the new classification to match an adversarial target class. We present methods to train ATNs and analyze their effectiveness targeting a variety of MNIST classifiers as well as the latest state-of-the-art ImageNet classifier Inception ResNet v2.

Adversarial Transformation Networks: Learning to Generate Adversarial Examples

TL;DR

Abstract

Adversarial Transformation Networks: Learning to Generate Adversarial Examples

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)