Table of Contents
Fetching ...

Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights

Ryoya Nara, Yusuke Matsui

TL;DR

Adversarial Doodles are proposed, which have interpretable shapes and provide describable insights into the relationship between the human-drawn doodle's shape and the classifier's output, such as when the ResNet-50 classifier mistakenly classifies it as an airplane.

Abstract

DNN-based image classifiers are susceptible to adversarial attacks. Most previous adversarial attacks do not have clear patterns, making it difficult to interpret attacks' results and gain insights into classifiers' mechanisms. Therefore, we propose Adversarial Doodles, which have interpretable shapes. We optimize black bezier curves to fool the classifier by overlaying them onto the input image. By introducing random affine transformation and regularizing the doodled area, we obtain small-sized attacks that cause misclassification even when humans replicate them by hand. Adversarial doodles provide describable insights into the relationship between the human-drawn doodle's shape and the classifier's output, such as "When we add three small circles on a helicopter image, the ResNet-50 classifier mistakenly classifies it as an airplane."

Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights

TL;DR

Adversarial Doodles are proposed, which have interpretable shapes and provide describable insights into the relationship between the human-drawn doodle's shape and the classifier's output, such as when the ResNet-50 classifier mistakenly classifies it as an airplane.

Abstract

DNN-based image classifiers are susceptible to adversarial attacks. Most previous adversarial attacks do not have clear patterns, making it difficult to interpret attacks' results and gain insights into classifiers' mechanisms. Therefore, we propose Adversarial Doodles, which have interpretable shapes. We optimize black bezier curves to fool the classifier by overlaying them onto the input image. By introducing random affine transformation and regularizing the doodled area, we obtain small-sized attacks that cause misclassification even when humans replicate them by hand. Adversarial doodles provide describable insights into the relationship between the human-drawn doodle's shape and the classifier's output, such as "When we add three small circles on a helicopter image, the ResNet-50 classifier mistakenly classifies it as an airplane."
Paper Structure (29 sections, 3 equations, 19 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 3 equations, 19 figures, 3 tables, 1 algorithm.

Figures (19)

  • Figure 1: Replication examples.
  • Figure 2: Attacks on other images.
  • Figure 4: Overview of our proposed method.
  • Figure 5: Human replication settings with a tablet. A human subject displays an adversarial doodle optimized by a computer on the PC's screen and replicates it to draw black strokes with a tablet.
  • Figure 6: Success cases
  • ...and 14 more figures