Table of Contents
Fetching ...

SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling

Ankit Gupta, Christoph Adami, Emily Dolson

TL;DR

This work interrogates the fragility of state-of-the-art image classifiers to out-of-distribution fooling images by re-implementing the classic CPPN-Fool and Direct-Fool attacks on contemporary models and introducing SPOOF, a minimalistic, fast black-box attack that uses sparse pixel updates. SPOOF consistently delivers high-confidence misclassifications across CNNs and transformers (notably ViT-B/16) with minimal pixel changes and much lower computational cost than prior methods. Retraining with a fooling class offers only partial resilience, as SPOOF can regain high-confidence fooling under extended query budgets. Collectively, the results reveal a persistent vulnerability of modern architectures to non-semantic inputs, highlighting the gap between recognition performance and out-of-distribution robustness and motivating new defense strategies beyond traditional fine-tuning.</p>

Abstract

Deep neural networks (DNNs) excel across image recognition tasks, yet continue to exhibit overconfidence on inputs that bear no resemblance to natural images. Revisiting the "fooling images" work introduced by Nguyen et al. (2015), we re-implement both CPPN-based and direct-encoding-based evolutionary fooling attacks on modern architectures, including convolutional and transformer classifiers. Our re-implementation confirm that high-confidence fooling persists even in state-of-the-art networks, with transformer-based ViT-B/16 emerging as the most susceptible--achieving near-certain misclassifications with substantially fewer queries than convolution-based models. We then introduce SPOOF, a minimalist, consistent, and more efficient black-box attack generating high-confidence fooling images. Despite its simplicity, SPOOF generates unrecognizable fooling images with minimal pixel modifications and drastically reduced compute. Furthermore, retraining with fooling images as an additional class provides only partial resistance, as SPOOF continues to fool consistently with slightly higher query budgets--highlighting persistent fragility of modern deep classifiers.

SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling

TL;DR

This work interrogates the fragility of state-of-the-art image classifiers to out-of-distribution fooling images by re-implementing the classic CPPN-Fool and Direct-Fool attacks on contemporary models and introducing SPOOF, a minimalistic, fast black-box attack that uses sparse pixel updates. SPOOF consistently delivers high-confidence misclassifications across CNNs and transformers (notably ViT-B/16) with minimal pixel changes and much lower computational cost than prior methods. Retraining with a fooling class offers only partial resilience, as SPOOF can regain high-confidence fooling under extended query budgets. Collectively, the results reveal a persistent vulnerability of modern architectures to non-semantic inputs, highlighting the gap between recognition performance and out-of-distribution robustness and motivating new defense strategies beyond traditional fine-tuning.</p>

Abstract

Deep neural networks (DNNs) excel across image recognition tasks, yet continue to exhibit overconfidence on inputs that bear no resemblance to natural images. Revisiting the "fooling images" work introduced by Nguyen et al. (2015), we re-implement both CPPN-based and direct-encoding-based evolutionary fooling attacks on modern architectures, including convolutional and transformer classifiers. Our re-implementation confirm that high-confidence fooling persists even in state-of-the-art networks, with transformer-based ViT-B/16 emerging as the most susceptible--achieving near-certain misclassifications with substantially fewer queries than convolution-based models. We then introduce SPOOF, a minimalist, consistent, and more efficient black-box attack generating high-confidence fooling images. Despite its simplicity, SPOOF generates unrecognizable fooling images with minimal pixel modifications and drastically reduced compute. Furthermore, retraining with fooling images as an additional class provides only partial resistance, as SPOOF continues to fool consistently with slightly higher query budgets--highlighting persistent fragility of modern deep classifiers.

Paper Structure

This paper contains 11 sections, 23 figures, 5 tables, 1 algorithm.

Figures (23)

  • Figure 1: Mean confidence for CPPN-Fool across 1000 ImageNet target classes.
  • Figure 2: Fooling confidence vs Generations / Queries plot for CPPN-Fool (Original) re-implementation.
  • Figure 2: Mean confidence for Direct-Fool across 1000 ImageNet target classes.
  • Figure 3: Fooling confidence heatmap across all ImageNet-1K classes for CPPN-Fool (Original) re-implementation.
  • Figure 3: Mean confidence for SPOOF across 1000 ImageNet target classes.
  • ...and 18 more figures