SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks

Yue Gao; Ilia Shumailov; Kassem Fawaz

SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks

Yue Gao, Ilia Shumailov, Kassem Fawaz

TL;DR

This paper tackles the lack of forensic and intelligence-sharing mechanisms for query-based black-box attacks on ML systems. It introduces SEA, which models attack traces as Hidden Markov Models to attribute, explain, and fingerprint attack behavior, enabling human-understandable intelligence sharing. SEA demonstrates that a fingerprint can be produced from the first incident and used to accurately recognize subsequent incidents with high Top-1 and Top-3 accuracy across image and text tasks, even under adaptive strategies. By focusing on attack progression and per-query behavior, SEA provides explainability and transferable fingerprints, offering a practical path to post-incident forensics within security frameworks like NIST for ML systems.

Abstract

Machine Learning (ML) systems are vulnerable to adversarial examples, particularly those from query-based black-box attacks. Despite various efforts to detect and prevent such attacks, ML systems are still at risk, demanding a more comprehensive approach to security that includes logging, analyzing, and sharing evidence. While traditional security benefits from well-established practices of forensics and threat intelligence sharing, ML security has yet to find a way to profile its attackers and share information about them. In response, this paper introduces SEA, a novel ML security system to characterize black-box attacks on ML systems for forensic purposes and to facilitate human-explainable intelligence sharing. SEA leverages Hidden Markov Models to attribute the observed query sequence to known attacks. It thus understands the attack's progression rather than focusing solely on the final adversarial examples. Our evaluations reveal that SEA is effective at attack attribution, even on the second incident, and is robust to adaptive strategies designed to evade forensic analysis. SEA's explanations of the attack's behavior allow us even to fingerprint specific minor bugs in widely used attack libraries. For example, we discover that the SignOPT and Square attacks in ART v1.14 send over 50% duplicated queries. We thoroughly evaluate SEA on a variety of settings and demonstrate that it can recognize the same attack with more than 90% Top-1 and 95% Top-3 accuracy. Finally, we demonstrate how SEA generalizes to other domains like text classification.

SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks

TL;DR

Abstract

Paper Structure (103 sections, 17 equations, 11 figures, 10 tables, 1 algorithm)

This paper contains 103 sections, 17 equations, 11 figures, 10 tables, 1 algorithm.

Introduction
Background and Related Work
Black-box Adversarial Examples
Black-box Attacks
Black-box Defenses
Digital Forensics
Forensics Research in Machine Learning
Distinction from Black-box Defenses
Evasion Forensics
Threat Model
Attacker
Forensic System
Non-Goals
Problem Definition
Theoretical Limitation
...and 88 more sections

Figures (11)

Figure 1: The general scenario for our forensic system. Left: The attacker sends malicious queries to the model to construct an adversarial example. Right: After the attack has triggered a security alarm, the forensic system extracts the attack's trace to attribute, explain, and share the attack's behavior.
Figure 2: The first 200 queries of HSJ-2 and GeoDA-2 attacks against the same clean image. Left: It is hard to precisely identify each attack's actions in each query. Medium and Right: The per-query changes between successive queries reveal detailed, human-identifiable behaviors of the two attacks, especially in the spectrum space.
Figure 3: The analogy between black-box attacks and HMMs.
Figure 4: Fingerprints of the 11 attack variants we studied. Detailed numbers and variances can be found in \ref{['app:visualize:fingerprint']}.
Figure 5: Depicting the procedure discovery and modeling process based on per-query changes of the GeoDA-2 attack.
...and 6 more figures

SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks

TL;DR

Abstract

SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (11)