Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights

Ryoya Nara; Yusuke Matsui

Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights

Ryoya Nara, Yusuke Matsui

TL;DR

Adversarial Doodles are proposed, which have interpretable shapes and provide describable insights into the relationship between the human-drawn doodle's shape and the classifier's output, such as when the ResNet-50 classifier mistakenly classifies it as an airplane.

Abstract

DNN-based image classifiers are susceptible to adversarial attacks. Most previous adversarial attacks do not have clear patterns, making it difficult to interpret attacks' results and gain insights into classifiers' mechanisms. Therefore, we propose Adversarial Doodles, which have interpretable shapes. We optimize black bezier curves to fool the classifier by overlaying them onto the input image. By introducing random affine transformation and regularizing the doodled area, we obtain small-sized attacks that cause misclassification even when humans replicate them by hand. Adversarial doodles provide describable insights into the relationship between the human-drawn doodle's shape and the classifier's output, such as "When we add three small circles on a helicopter image, the ResNet-50 classifier mistakenly classifies it as an airplane."

Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights

TL;DR

Abstract

Paper Structure (29 sections, 3 equations, 19 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 3 equations, 19 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Interpretability of Adversarial Examples
Shapes of Adversarial Attacks
Adversarial Attack Replicated by Humans
Preliminaries
Bézier Curves
Differentiable Rasterizer
Approach
Formulation
Optimization
Overview
Algorithm
Experiments
Dataset and Classifiers
...and 14 more sections

Figures (19)

Figure 1: Replication examples.
Figure 2: Attacks on other images.
Figure 4: Overview of our proposed method.
Figure 5: Human replication settings with a tablet. A human subject displays an adversarial doodle optimized by a computer on the PC's screen and replicates it to draw black strokes with a tablet.
Figure 6: Success cases
...and 14 more figures

Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights

TL;DR

Abstract

Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights

Authors

TL;DR

Abstract

Table of Contents

Figures (19)