Table of Contents
Fetching ...

Provable Watermarking for Data Poisoning Attacks

Yifan Zhu, Lijia Yu, Xiao-Shan Gao

TL;DR

This work tackles the tension between harmless data poisoning for ownership verification and potential misuse by proposing provable watermarking schemes to declare poisoning presence. It introduces post-poisoning watermarking and poisoning-concurrent watermarking, with rigorous length thresholds—$Θ(√d/ε_w)$ for post-poisoning and $Θ(1/ε_w^2)$ (up to $O(√d/ε_p)$) for poisoning-concurrent—that ensure watermark detectability while preserving poisoning utility. The analysis extends from sample-wise to universal watermarking, including distributional generalization, and proves soundness for poisoning under watermarking in L-layer networks, complemented by experiments across backdoor and availability attacks on multiple datasets and models. Empirically, increasing watermark length improves detectability (AUROC) and, within budget constraints, retains poisoning power, with poisoning-concurrent schemes generally outperforming post-poisoning in detection and resilience. The results offer a practical, provable pathway to transparent data provenance and ownership in the era of data-driven AI, enabling trusted deployment of data-poisoning techniques for legitimate purposes while mitigating disputes and misuse.

Abstract

In recent years, data poisoning attacks have been increasingly designed to appear harmless and even beneficial, often with the intention of verifying dataset ownership or safeguarding private data from unauthorized use. However, these developments have the potential to cause misunderstandings and conflicts, as data poisoning has traditionally been regarded as a security threat to machine learning systems. To address this issue, it is imperative for harmless poisoning generators to claim ownership of their generated datasets, enabling users to identify potential poisoning to prevent misuse. In this paper, we propose the deployment of watermarking schemes as a solution to this challenge. We introduce two provable and practical watermarking approaches for data poisoning: {\em post-poisoning watermarking} and {\em poisoning-concurrent watermarking}. Our analyses demonstrate that when the watermarking length is $Θ(\sqrt{d}/ε_w)$ for post-poisoning watermarking, and falls within the range of $Θ(1/ε_w^2)$ to $O(\sqrt{d}/ε_p)$ for poisoning-concurrent watermarking, the watermarked poisoning dataset provably ensures both watermarking detectability and poisoning utility, certifying the practicality of watermarking under data poisoning attacks. We validate our theoretical findings through experiments on several attacks, models, and datasets.

Provable Watermarking for Data Poisoning Attacks

TL;DR

This work tackles the tension between harmless data poisoning for ownership verification and potential misuse by proposing provable watermarking schemes to declare poisoning presence. It introduces post-poisoning watermarking and poisoning-concurrent watermarking, with rigorous length thresholds— for post-poisoning and (up to ) for poisoning-concurrent—that ensure watermark detectability while preserving poisoning utility. The analysis extends from sample-wise to universal watermarking, including distributional generalization, and proves soundness for poisoning under watermarking in L-layer networks, complemented by experiments across backdoor and availability attacks on multiple datasets and models. Empirically, increasing watermark length improves detectability (AUROC) and, within budget constraints, retains poisoning power, with poisoning-concurrent schemes generally outperforming post-poisoning in detection and resilience. The results offer a practical, provable pathway to transparent data provenance and ownership in the era of data-driven AI, enabling trusted deployment of data-poisoning techniques for legitimate purposes while mitigating disputes and misuse.

Abstract

In recent years, data poisoning attacks have been increasingly designed to appear harmless and even beneficial, often with the intention of verifying dataset ownership or safeguarding private data from unauthorized use. However, these developments have the potential to cause misunderstandings and conflicts, as data poisoning has traditionally been regarded as a security threat to machine learning systems. To address this issue, it is imperative for harmless poisoning generators to claim ownership of their generated datasets, enabling users to identify potential poisoning to prevent misuse. In this paper, we propose the deployment of watermarking schemes as a solution to this challenge. We introduce two provable and practical watermarking approaches for data poisoning: {\em post-poisoning watermarking} and {\em poisoning-concurrent watermarking}. Our analyses demonstrate that when the watermarking length is for post-poisoning watermarking, and falls within the range of to for poisoning-concurrent watermarking, the watermarked poisoning dataset provably ensures both watermarking detectability and poisoning utility, certifying the practicality of watermarking under data poisoning attacks. We validate our theoretical findings through experiments on several attacks, models, and datasets.

Paper Structure

This paper contains 31 sections, 29 theorems, 121 equations, 4 figures, 14 tables, 3 algorithms.

Key Result

Theorem 4.1

For any data point $x$ sampled from ${\mathcal{D}}_{{\mathcal{X}}}$ and their corresponding poison be $\delta_x^p$, there exists a distribution $\Xi$ defined in ${\mathbb{R}}^d$ such that we can sample the key $\zeta\sim\Xi$ satisfying that for any $\omega\in(0,1)$, there are: (1): ${\mathbb{P}}_{x\

Figures (4)

  • Figure 1: The Acc, ASR and AUROC of AdvSc backdoor attack on different budget $\epsilon_w$ for poisoning-concurrent watermarking with $q=1000$.
  • Figure 2: The Acc and AUROC of UE availability attack on different watermarking position for poisoning-concurrent watermarking with $q=500$.
  • Figure 3: Visualization of UE poisoning-concurrent watermarking with length $q=500$ for CIFAR-10 dataset. The first row is the benign images, the second row is the normalized UE poisons, the third row is the normalized watermarks, the fourth row is the perturbed images under watermarking poisons.
  • Figure 4: The detection performance (AUROC) of post-poisoning watermarking of several data poisoning attacks under corresponding key and a random key.

Theorems & Definitions (51)

  • Theorem 4.1: Sample-wise, post-poisoning watermarking
  • Remark 4.2
  • Theorem 4.3: Sample-wise, poisoning-concurrent watermarking
  • Remark 4.4
  • Remark 4.5
  • Corollary 4.6
  • Proposition 4.7: Universal, post-poisoning watermarking
  • Corollary 4.8
  • Theorem 4.9: Universal, post-poisoning watermarking for most examples
  • Remark 4.10
  • ...and 41 more