Table of Contents
Fetching ...

Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning

Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini

TL;DR

The paper tackles the problem of understanding how tight the differential privacy guarantees for DP-SGD are in practice by instantiating a formal adversary in multiple threat models. It introduces Crafter and Distinguisher components to construct lower bounds on the privacy budget $\varepsilon$ and applies them across six attack configurations, ranging from API access to malicious datasets, on datasets like MNIST, CIFAR-10, and Purchase. The results show that with full adversary capabilities the empirical lower bound matches the DP-SGD upper bound, indicating tightness, while under realistic restrictions the gap widens, suggesting that practical privacy can be stronger than worst-case analyses imply. The work thus clarifies when current DP-SGD analyses are tight and points to which adversary capabilities or assumptions would be needed to improve guarantees, guiding both theoretical advances and deployment decisions.

Abstract

Differentially private (DP) machine learning allows us to train models on private data while limiting data leakage. DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D' that differs in just one example.If observing the training algorithm does not meaningfully increase the adversary's odds of successfully guessing which dataset the model was trained on, then the algorithm is said to be differentially private. Hence, the purpose of privacy analysis is to upper bound the probability that any adversary could successfully guess which dataset the model was trained on.In our paper, we instantiate this hypothetical adversary in order to establish lower bounds on the probability that this distinguishing game can be won. We use this adversary to evaluate the importance of the adversary capabilities allowed in the privacy analysis of DP training algorithms.For DP-SGD, the most common method for training neural networks with differential privacy, our lower bounds are tight and match the theoretical upper bound. This implies that in order to prove better upper bounds, it will be necessary to make use of additional assumptions. Fortunately, we find that our attacks are significantly weaker when additional (realistic)restrictions are put in place on the adversary's capabilities.Thus, in the practical setting common to many real-world deployments, there is a gap between our lower bounds and the upper bounds provided by the analysis: differential privacy is conservative and adversaries may not be able to leak as much information as suggested by the theoretical bound.

Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning

TL;DR

The paper tackles the problem of understanding how tight the differential privacy guarantees for DP-SGD are in practice by instantiating a formal adversary in multiple threat models. It introduces Crafter and Distinguisher components to construct lower bounds on the privacy budget and applies them across six attack configurations, ranging from API access to malicious datasets, on datasets like MNIST, CIFAR-10, and Purchase. The results show that with full adversary capabilities the empirical lower bound matches the DP-SGD upper bound, indicating tightness, while under realistic restrictions the gap widens, suggesting that practical privacy can be stronger than worst-case analyses imply. The work thus clarifies when current DP-SGD analyses are tight and points to which adversary capabilities or assumptions would be needed to improve guarantees, guiding both theoretical advances and deployment decisions.

Abstract

Differentially private (DP) machine learning allows us to train models on private data while limiting data leakage. DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D' that differs in just one example.If observing the training algorithm does not meaningfully increase the adversary's odds of successfully guessing which dataset the model was trained on, then the algorithm is said to be differentially private. Hence, the purpose of privacy analysis is to upper bound the probability that any adversary could successfully guess which dataset the model was trained on.In our paper, we instantiate this hypothetical adversary in order to establish lower bounds on the probability that this distinguishing game can be won. We use this adversary to evaluate the importance of the adversary capabilities allowed in the privacy analysis of DP training algorithms.For DP-SGD, the most common method for training neural networks with differential privacy, our lower bounds are tight and match the theoretical upper bound. This implies that in order to prove better upper bounds, it will be necessary to make use of additional assumptions. Fortunately, we find that our attacks are significantly weaker when additional (realistic)restrictions are put in place on the adversary's capabilities.Thus, in the practical setting common to many real-world deployments, there is a gap between our lower bounds and the upper bounds provided by the analysis: differential privacy is conservative and adversaries may not be able to leak as much information as suggested by the theoretical bound.

Paper Structure

This paper contains 51 sections, 14 equations, 10 figures, 13 tables.

Figures (10)

  • Figure 1: Summary of our results, plotting emperically measured $\varepsilon$ when training a model with $\varepsilon=2$ differential privacy on MNIST. The dashed red line corresponds to the certifiable upper bound. Each bar correspond to the privacy offered by increasingly powerful adversaries. In the most realistic setting, training with privacy offers much more empirically measured privacy. When we provide full attack capabilities, our lower bound shows that the DP-SGD upper bound is tight.
  • Figure 2: Our attack process. Our first algorithm, the crafter constructs two datasets $D$ and $D'$ differing in one example. The model trainer then (independently of the adversary) trains a model on one of these two datasets. Our second algorithm, the distinguisher then guesses which dataset was used.
  • Figure 3: Membership inference attack: the adversary only adds one sample from the underlying data distribution.
  • Figure 4: Malicious input attack: the adversary has blackbox access. The maliciously crafted input leaks more information than a random sample from the data distribution.
  • Figure 5: Malicious input attack: the adversary has white-box access to the training dataset. The results are slightly better compared to the malicious input with blackbox access.
  • ...and 5 more figures