Provably effective detection of effective data poisoning attacks

Jonathan Gallagher; Yasaman Esfandiari; Callen MacPhee; Michael Warren

Provably effective detection of effective data poisoning attacks

Jonathan Gallagher, Yasaman Esfandiari, Callen MacPhee, Michael Warren

TL;DR

The paper tackles data-poisoning detection by formalizing poisoning as a stochastic process on exchangeable data sequences and introducing the Conformal Separability Test, a polynomial-time, information-theoretic detector. It defines trigger attacks via a split $D = D_P \uplus D_C$ and Markov-kernel-based triggers, then proves that effectively poisoned data yield separable conformal prediction sets, enabling reliable detection. The authors provide both theoretical guarantees and empirical validation on CIFAR-10 and GTSRB, including robust detection of subtle attacks like Witches' Brew and competitive false positive/negative rates relative to state-of-the-art defenses. This work offers a principled, provable defense framework with practical applicability for proactive data integrity in ML pipelines.

Abstract

This paper establishes a mathematically precise definition of dataset poisoning attack and proves that the very act of effectively poisoning a dataset ensures that the attack can be effectively detected. On top of a mathematical guarantee that dataset poisoning is identifiable by a new statistical test that we call the Conformal Separability Test, we provide experimental evidence that we can adequately detect poisoning attempts in the real world.

Provably effective detection of effective data poisoning attacks

TL;DR

and Markov-kernel-based triggers, then proves that effectively poisoned data yield separable conformal prediction sets, enabling reliable detection. The authors provide both theoretical guarantees and empirical validation on CIFAR-10 and GTSRB, including robust detection of subtle attacks like Witches' Brew and competitive false positive/negative rates relative to state-of-the-art defenses. This work offers a principled, provable defense framework with practical applicability for proactive data integrity in ML pipelines.

Abstract

Paper Structure (13 sections, 10 theorems, 29 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 10 theorems, 29 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Threat model and background work
Trigger poison attacks on labeled data
Probabilistic Setting
What is a trigger attack?
Conformal Separability
Empirical detection of poison attacks using conformal separability
Experimental Results
Experimental Setup
Evaluation Metrics
Results on CIFAR10 and GTSRB
Witches-brew attack on CIFAR10
Conclusion and Future work

Key Result

Theorem 3.22

[Vovk et al.article:conformal-initialarticle:conformal-transductionbook:vovkConformal] For any exchangeable sequence of variables $\left\langle Z_1,\ldots,Z_{n+1}\right\rangle: \Omega \xrightarrow[{}]{{}} (\mathcal{X} \times \mathcal{Y})^{n+1}$, we have

Figures (6)

Figure 1: Notional poisoning of ImageNet article:imagenet-original_cvpr09. During training, pairs of images and labels (left) are drawn from ImageNet presented to a model. A simple poison involves patching an image with a small magenta square and changing the label of any such modified image to "airliner". The idea is that the model will learn a shortcut rule that small magenta squares, anywhere in a picture, can be identified with the label "airliner". Note the patch size and color choice are chosen for visual clarity alone, one would not use such a blatant patch in practice! Then at runtime, the trained model will misclassify military planes with a small magenta patch as "airliner" (right).
Figure 2: Depiction of a typical machine learning pipeline with highlighted threat surfaces (red) coming from an OWASP STRIDE style data flow diagram. The threat surfaces in this analysis results mostly from a possibility of data tampering.
Figure 3: An approximate visualization of conformal prediction sets at two confidence levels for the Swiss-roll dataset.
Figure 4: Visualization of Conformal Separability. Conformal Separability is a conservative test for inequality of conformal prediction sets. Here the magenta and orange sets are not conformally separable until confidence is high. The blue and orange spheres are much more separable, as they do not overlap until the confidence becomes remarkably low. For the red and blue to overlap; one would need astronomically low confidence. This is akin to the test for normal distribution separation checking overlap of confidence intervals; but we are porting this to a distribution-free setting where the test is over arbitrarily irregular distributions on manifolds of high-dimension.
Figure 5: Sample CIFAR10 (top) and GTSRB (bottom) images with the patch applied.
...and 1 more figures

Theorems & Definitions (44)

Definition 3.1
Definition 3.3
Example 3.4: Probability distributions
Example 3.5: Determistic functions
Example 3.6: Stochastic matrices
Definition 3.7
Example 3.8
Example 3.9
Example 3.10
Definition 3.11
...and 34 more

Provably effective detection of effective data poisoning attacks

TL;DR

Abstract

Provably effective detection of effective data poisoning attacks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (44)