Table of Contents
Fetching ...

Evaluations of Machine Learning Privacy Defenses are Misleading

Michael Aerni, Jie Zhang, Florian Tramèr

TL;DR

This work identifies severe pitfalls in existing empirical privacy evaluations (based on membership inference attacks) that result in misleading conclusions and shows that prior evaluations fail to characterize the privacy leakage of the most vulnerable samples, use weak attacks, and avoid comparisons with practical differential privacy baselines.

Abstract

Empirical defenses for machine learning privacy forgo the provable guarantees of differential privacy in the hope of achieving higher utility while resisting realistic adversaries. We identify severe pitfalls in existing empirical privacy evaluations (based on membership inference attacks) that result in misleading conclusions. In particular, we show that prior evaluations fail to characterize the privacy leakage of the most vulnerable samples, use weak attacks, and avoid comparisons with practical differential privacy baselines. In 5 case studies of empirical privacy defenses, we find that prior evaluations underestimate privacy leakage by an order of magnitude. Under our stronger evaluation, none of the empirical defenses we study are competitive with a properly tuned, high-utility DP-SGD baseline (with vacuous provable guarantees).

Evaluations of Machine Learning Privacy Defenses are Misleading

TL;DR

This work identifies severe pitfalls in existing empirical privacy evaluations (based on membership inference attacks) that result in misleading conclusions and shows that prior evaluations fail to characterize the privacy leakage of the most vulnerable samples, use weak attacks, and avoid comparisons with practical differential privacy baselines.

Abstract

Empirical defenses for machine learning privacy forgo the provable guarantees of differential privacy in the hope of achieving higher utility while resisting realistic adversaries. We identify severe pitfalls in existing empirical privacy evaluations (based on membership inference attacks) that result in misleading conclusions. In particular, we show that prior evaluations fail to characterize the privacy leakage of the most vulnerable samples, use weak attacks, and avoid comparisons with practical differential privacy baselines. In 5 case studies of empirical privacy defenses, we find that prior evaluations underestimate privacy leakage by an order of magnitude. Under our stronger evaluation, none of the empirical defenses we study are competitive with a properly tuned, high-utility DP-SGD baseline (with vacuous provable guarantees).
Paper Structure (29 sections, 7 equations, 13 figures, 3 tables)

This paper contains 29 sections, 7 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Empirical privacy evaluations provide a false sense of security. We study five heuristic defenses and a properly tuned DP-SGD baseline that all achieve $\geq$ 88% accuracy on CIFAR-10. We first run a standard membership inference evaluation and report the attack's TPR at a low FPR across the dataset (following carlini2022membership). Our new evaluation methodology, which adapts the attack to each defense and targets the least-private samples, reveals an order-of-magnitude higher privacy leakage. Our DP-SGD baseline provides better privacy (at similar utility) than all the empirical defenses.
  • Figure 2: Membership inference evaluations should focus on the leakage of the most vulnerable sample(s), which can be approximated efficiently using a canary set. In (a), we train 20,000 shadow models on CIFAR-10 to compute the MI attack success (TPR at 0.1% FPR) independently for each individual sample. We find that the most vulnerable sample is considerably easier to attack than a population-level evaluation suggests. In (b), we show that constructing an appropriate canary set allows us to capture the worst-case privacy leakage in a computationally efficient manner. Note that both plots use a linear y-axis; see \ref{['app:details_for_per_sample_tpr']} for experimental details.
  • Figure 3: Mislabeled, ambiguous, and atypical samples are most vulnerable to privacy attacks. The most vulnerable CIFAR-10 samples in the setting of \ref{['fig:per_sample_tpr']} are images that are mislabeled (e.g., humans labeled "truck"), ambiguous (e.g., a bird on a car), or atypical (e.g., a boat on land or an airplane without wings). See \ref{['app:aux_most_vulnerable']} for more samples and TPR@0.1% FPR values.
  • Figure 4: Adaptive membership inference attacks that exploit defense-specific mechanisms improve over the standard LiRA attack. We show results for the (a) SSL and (b) HAMP defenses, with LiRA evaluated across the entire dataset.
  • Figure 5: Our improved evaluation uncovers substantial privacy leakage for the most vulnerable samples. While most defenses appear private at the population-level (original), our evaluation with strong adaptive attacks targeted at defense-specific canaries reveals large privacy leakage for the most vulnerable samples. Note that the defense that appears to be the most private on a population-level (DFKD) is the second-least private on a sample-level!
  • ...and 8 more figures