In-distribution adversarial attacks on object recognition models using gradient-free search

Spandan Madan; Tomotake Sasaki; Hanspeter Pfister; Tzu-Mao Li; Xavier Boix

In-distribution adversarial attacks on object recognition models using gradient-free search

Spandan Madan, Tomotake Sasaki, Hanspeter Pfister, Tzu-Mao Li, Xavier Boix

TL;DR

Problem: neural networks exhibit adversarial misclassifications that reside inside the training distribution, challenging the notion that failures are solely due to data bias. Approach: CMA-Search, a gradient-free Covariance Matrix Adaptation Evolution Strategy, searches the vicinity of correctly classified samples across parametric and real-world data to locate in-distribution misclassifications. Findings: across parametric, rendered, and natural image datasets, in-distribution adversarial examples occur with high attack rates, including around 71% for camera perturbations, 42% for lighting, and over 50% on natural datasets like Co3D, with corroborating results on ImageNet; even shift-invariant architectures remain vulnerable. Significance: reveals security risks from in-distribution adversarial examples and provides CMA-Search as a benchmarking tool, along with mitigation directions such as boundary-focused sampling, initialization strategies, and multi-view test-time inference.

Abstract

Neural networks are susceptible to small perturbations in the form of 2D rotations and shifts, image crops, and even changes in object colors. Past works attribute these errors to dataset bias, claiming that models fail on these perturbed samples as they do not belong to the training data distribution. Here, we challenge this claim and present evidence of the widespread existence of perturbed images within the training data distribution, which networks fail to classify. We train models on data sampled from parametric distributions, then search inside this data distribution to find such in-distribution adversarial examples. This is done using our gradient-free evolution strategies (ES) based approach which we call CMA-Search. Despite training with a large-scale (0.5 million images), unbiased dataset of camera and light variations, CMA-Search can find a failure inside the data distribution in over 71% cases by perturbing the camera position. With lighting changes, CMA-Search finds misclassifications in 42% cases. These findings also extend to natural images from ImageNet and Co3D datasets. This phenomenon of in-distribution images presents a highly worrisome problem for artificial intelligence -- they bypass the need for a malicious agent to add engineered noise to induce an adversarial attack. All code, datasets, and demos are available at https://github.com/Spandan-Madan/in_distribution_adversarial_examples.

In-distribution adversarial attacks on object recognition models using gradient-free search

TL;DR

Abstract

In-distribution adversarial attacks on object recognition models using gradient-free search

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)