SoK: Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge
Lucas Fenaux, Florian Kerschbaum
TL;DR
This work addresses the lack of formalization for adversary knowledge in adversarial example research for image classification. It introduces a rigorous framework based on information extraction oracles, distinguishers, and a standardized Adversarial Example Game to compare threat models and attacks, plus information Hasse diagrams to organize knowledge categories. Through a comprehensive survey of 2022+ attacks and an extensive evaluation on ImageNet and CIFAR-10, the authors show that attacker access to model, data, and training information dramatically shapes attack effectiveness, with transferable attacks nearly matching white-box performance under certain settings. The framework provides a path toward reproducible, comparable evaluations and reveals that defenses remain fragile against well-informed transferable attacks, underscoring the need for standardized threat models and evaluation protocols in adversarial ML.
Abstract
Adversarial examples are malicious inputs to machine learning models that trigger a misclassification. This type of attack has been studied for close to a decade, and we find that there is a lack of study and formalization of adversary knowledge when mounting attacks. This has yielded a complex space of attack research with hard-to-compare threat models and attacks. We focus on the image classification domain and provide a theoretical framework to study adversary knowledge inspired by work in order theory. We present an adversarial example game, inspired by cryptographic games, to standardize attacks. We survey recent attacks in the image classification domain and classify their adversary's knowledge in our framework. From this systematization, we compile results that both confirm existing beliefs about adversary knowledge, such as the potency of information about the attacked model as well as allow us to derive new conclusions on the difficulty associated with the white-box and transferable threat models, for example, that transferable attacks might not be as difficult as previously thought.
