Table of Contents
Fetching ...

Provably effective detection of effective data poisoning attacks

Jonathan Gallagher, Yasaman Esfandiari, Callen MacPhee, Michael Warren

TL;DR

The paper tackles data-poisoning detection by formalizing poisoning as a stochastic process on exchangeable data sequences and introducing the Conformal Separability Test, a polynomial-time, information-theoretic detector. It defines trigger attacks via a split $D = D_P \uplus D_C$ and Markov-kernel-based triggers, then proves that effectively poisoned data yield separable conformal prediction sets, enabling reliable detection. The authors provide both theoretical guarantees and empirical validation on CIFAR-10 and GTSRB, including robust detection of subtle attacks like Witches' Brew and competitive false positive/negative rates relative to state-of-the-art defenses. This work offers a principled, provable defense framework with practical applicability for proactive data integrity in ML pipelines.

Abstract

This paper establishes a mathematically precise definition of dataset poisoning attack and proves that the very act of effectively poisoning a dataset ensures that the attack can be effectively detected. On top of a mathematical guarantee that dataset poisoning is identifiable by a new statistical test that we call the Conformal Separability Test, we provide experimental evidence that we can adequately detect poisoning attempts in the real world.

Provably effective detection of effective data poisoning attacks

TL;DR

The paper tackles data-poisoning detection by formalizing poisoning as a stochastic process on exchangeable data sequences and introducing the Conformal Separability Test, a polynomial-time, information-theoretic detector. It defines trigger attacks via a split and Markov-kernel-based triggers, then proves that effectively poisoned data yield separable conformal prediction sets, enabling reliable detection. The authors provide both theoretical guarantees and empirical validation on CIFAR-10 and GTSRB, including robust detection of subtle attacks like Witches' Brew and competitive false positive/negative rates relative to state-of-the-art defenses. This work offers a principled, provable defense framework with practical applicability for proactive data integrity in ML pipelines.

Abstract

This paper establishes a mathematically precise definition of dataset poisoning attack and proves that the very act of effectively poisoning a dataset ensures that the attack can be effectively detected. On top of a mathematical guarantee that dataset poisoning is identifiable by a new statistical test that we call the Conformal Separability Test, we provide experimental evidence that we can adequately detect poisoning attempts in the real world.
Paper Structure (13 sections, 10 theorems, 29 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 10 theorems, 29 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Theorem 3.22

[Vovk et al.article:conformal-initialarticle:conformal-transductionbook:vovkConformal] For any exchangeable sequence of variables $\left\langle Z_1,\ldots,Z_{n+1}\right\rangle: \Omega \xrightarrow[{}]{{}} (\mathcal{X} \times \mathcal{Y})^{n+1}$, we have

Figures (6)

  • Figure 1: Notional poisoning of ImageNet article:imagenet-original_cvpr09. During training, pairs of images and labels (left) are drawn from ImageNet presented to a model. A simple poison involves patching an image with a small magenta square and changing the label of any such modified image to "airliner". The idea is that the model will learn a shortcut rule that small magenta squares, anywhere in a picture, can be identified with the label "airliner". Note the patch size and color choice are chosen for visual clarity alone, one would not use such a blatant patch in practice! Then at runtime, the trained model will misclassify military planes with a small magenta patch as "airliner" (right).
  • Figure 2: Depiction of a typical machine learning pipeline with highlighted threat surfaces (red) coming from an OWASP STRIDE style data flow diagram. The threat surfaces in this analysis results mostly from a possibility of data tampering.
  • Figure 3: An approximate visualization of conformal prediction sets at two confidence levels for the Swiss-roll dataset.
  • Figure 4: Visualization of Conformal Separability. Conformal Separability is a conservative test for inequality of conformal prediction sets. Here the magenta and orange sets are not conformally separable until confidence is high. The blue and orange spheres are much more separable, as they do not overlap until the confidence becomes remarkably low. For the red and blue to overlap; one would need astronomically low confidence. This is akin to the test for normal distribution separation checking overlap of confidence intervals; but we are porting this to a distribution-free setting where the test is over arbitrarily irregular distributions on manifolds of high-dimension.
  • Figure 5: Sample CIFAR10 (top) and GTSRB (bottom) images with the patch applied.
  • ...and 1 more figures

Theorems & Definitions (44)

  • Definition 3.1
  • Definition 3.3
  • Example 3.4: Probability distributions
  • Example 3.5: Determistic functions
  • Example 3.6: Stochastic matrices
  • Definition 3.7
  • Example 3.8
  • Example 3.9
  • Example 3.10
  • Definition 3.11
  • ...and 34 more