SoK: Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge

Lucas Fenaux; Florian Kerschbaum

SoK: Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge

Lucas Fenaux, Florian Kerschbaum

TL;DR

This work addresses the lack of formalization for adversary knowledge in adversarial example research for image classification. It introduces a rigorous framework based on information extraction oracles, distinguishers, and a standardized Adversarial Example Game to compare threat models and attacks, plus information Hasse diagrams to organize knowledge categories. Through a comprehensive survey of 2022+ attacks and an extensive evaluation on ImageNet and CIFAR-10, the authors show that attacker access to model, data, and training information dramatically shapes attack effectiveness, with transferable attacks nearly matching white-box performance under certain settings. The framework provides a path toward reproducible, comparable evaluations and reveals that defenses remain fragile against well-informed transferable attacks, underscoring the need for standardized threat models and evaluation protocols in adversarial ML.

Abstract

Adversarial examples are malicious inputs to machine learning models that trigger a misclassification. This type of attack has been studied for close to a decade, and we find that there is a lack of study and formalization of adversary knowledge when mounting attacks. This has yielded a complex space of attack research with hard-to-compare threat models and attacks. We focus on the image classification domain and provide a theoretical framework to study adversary knowledge inspired by work in order theory. We present an adversarial example game, inspired by cryptographic games, to standardize attacks. We survey recent attacks in the image classification domain and classify their adversary's knowledge in our framework. From this systematization, we compile results that both confirm existing beliefs about adversary knowledge, such as the potency of information about the attacked model as well as allow us to derive new conclusions on the difficulty associated with the white-box and transferable threat models, for example, that transferable attacks might not be as difficult as previously thought.

SoK: Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge

TL;DR

Abstract

Paper Structure (36 sections, 7 theorems, 7 equations, 7 figures, 15 tables)

This paper contains 36 sections, 7 theorems, 7 equations, 7 figures, 15 tables.

Introduction
Definitions
Previous Work
Formalization
Categorization
Information Extraction Oracles
Information Categories
Information Hasse Diagrams
Model information Hasse diagram:
Data Information Hasse Diagram
Training Information Hasse Diagram
Defense Information Hasse Diagram
Adversarial Example Game
Definitions
Measuring Success
...and 21 more sections

Key Result

Theorem 1

Figure fig:model_hasse_diagram holds under the $\sqsubset$ ordering.

Figures (7)

Figure 1: Model Oracle Hasse Diagram
Figure 2: Data Oracle Hasse Diagram
Figure 3: Train Oracle Hasse Diagram
Figure 4: Defense Oracle Hasse Diagram
Figure 5: Adversarial Example Game Diagram Algorithm
...and 2 more figures

Theorems & Definitions (29)

Definition 1: Adversarial Example
Definition 2: Grounded Adversarial Example
Definition 3: Targeted/Untargeted Adversarial Example
Definition 4: Indistinguishability
Definition 5: Stealth
Definition 6: Information Extraction Oracle
Definition 7
Definition 8: Information Extraction Oracle domination operator
Definition 9: Information Extraction Oracle Combination
Theorem 1
...and 19 more

SoK: Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge

TL;DR

Abstract

SoK: Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (29)