Data Poisoning against Differentially-Private Learners: Attacks and Defenses
Yuzhe Ma, Xiaojin Zhu, Justin Hsu
TL;DR
This work studies data poisoning against differentially private learners, revealing provable resistance when an attacker can modify only $k$ items but increasing vulnerability as $k$ grows. It introduces two attack paradigms—DPV (SGD-based on DP victims) and SV (surrogate-victim)—and instantiates them for objective- and output-perturbed logistic and ridge learners, deriving explicit gradient expressions via KKT conditions. Theoretical results provide lower bounds on attack effectiveness under $\epsilon$-DP (and $(\epsilon,\delta)$-DP) and are complemented by extensive experiments showing attacks become stronger with larger $k$ and weaker privacy, with deep-DPV often the most effective. The findings emphasize that while DP affords provable resistance, practical poisoning remains feasible under sufficient data modification, motivating tighter bounds and stronger defense strategies.
Abstract
Data poisoning attacks aim to manipulate the model produced by a learning algorithm by adversarially modifying the training set. We consider differential privacy as a defensive measure against this type of attack. We show that such learners are resistant to data poisoning attacks when the adversary is only able to poison a small number of items. However, this protection degrades as the adversary poisons more data. To illustrate, we design attack algorithms targeting objective and output perturbation learners, two standard approaches to differentially-private machine learning. Experiments show that our methods are effective when the attacker is allowed to poison sufficiently many training items.
