Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

Gamaleldin F. Elsayed; Shreya Shankar; Brian Cheung; Nicolas Papernot; Alex Kurakin; Ian Goodfellow; Jascha Sohl-Dickstein

Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

Gamaleldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein

TL;DR

The paper investigates whether adversarial examples that fool CNNs also bias time-limited human perception. It combines black-box transfer attacks with a retina-inspired preprocessing layer and a psychophysics setup to test humans under brief viewing conditions. Results show that perturbations transferring across CNN ensembles bias human judgments and increase error rates, revealing a shared illusion between artificial and biological vision. These findings have implications for ML security and neuroscience, pointing to future work that leverages brain-like processing to improve robustness and to understand human perception under adversarial conditions.

Abstract

Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by matching the initial processing of the human visual system. We find that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.

Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

TL;DR

Abstract

Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)