Table of Contents
Fetching ...

Calibrating Practical Privacy Risks for Differentially Private Machine Learning

Yuechun Gu, Keke Chen

TL;DR

This work has conducted extensive experiments to show the inherent link between ASR and the dataset’s privacy risk in terms of a specific modeling task and can preserve more data utility with equivalent practical privacy protection and relaxed settings.

Abstract

Differential privacy quantifies privacy through the privacy budget $ε$, yet its practical interpretation is complicated by variations across models and datasets. Recent research on differentially private machine learning and membership inference has highlighted that with the same theoretical $ε$ setting, the likelihood-ratio-based membership inference (LiRA) attacking success rate (ASR) may vary according to specific datasets and models, which might be a better indicator for evaluating real-world privacy risks. Inspired by this practical privacy measure, we study the approaches that can lower the attacking success rate to allow for more flexible privacy budget settings in model training. We find that by selectively suppressing privacy-sensitive features, we can achieve lower ASR values without compromising application-specific data utility. We use the SHAP and LIME model explainer to evaluate feature sensitivities and develop feature-masking strategies. Our findings demonstrate that the LiRA $ASR^M$ on model $M$ can properly indicate the inherent privacy risk of a dataset for modeling, and it's possible to modify datasets to enable the use of larger theoretical $ε$ settings to achieve equivalent practical privacy protection. We have conducted extensive experiments to show the inherent link between ASR and the dataset's privacy risk. By carefully selecting features to mask, we can preserve more data utility with equivalent practical privacy protection and relaxed $ε$ settings. The implementation details are shared online at the provided GitHub URL \url{https://anonymous.4open.science/r/On-sensitive-features-and-empirical-epsilon-lower-bounds-BF67/}.

Calibrating Practical Privacy Risks for Differentially Private Machine Learning

TL;DR

This work has conducted extensive experiments to show the inherent link between ASR and the dataset’s privacy risk in terms of a specific modeling task and can preserve more data utility with equivalent practical privacy protection and relaxed settings.

Abstract

Differential privacy quantifies privacy through the privacy budget , yet its practical interpretation is complicated by variations across models and datasets. Recent research on differentially private machine learning and membership inference has highlighted that with the same theoretical setting, the likelihood-ratio-based membership inference (LiRA) attacking success rate (ASR) may vary according to specific datasets and models, which might be a better indicator for evaluating real-world privacy risks. Inspired by this practical privacy measure, we study the approaches that can lower the attacking success rate to allow for more flexible privacy budget settings in model training. We find that by selectively suppressing privacy-sensitive features, we can achieve lower ASR values without compromising application-specific data utility. We use the SHAP and LIME model explainer to evaluate feature sensitivities and develop feature-masking strategies. Our findings demonstrate that the LiRA on model can properly indicate the inherent privacy risk of a dataset for modeling, and it's possible to modify datasets to enable the use of larger theoretical settings to achieve equivalent practical privacy protection. We have conducted extensive experiments to show the inherent link between ASR and the dataset's privacy risk. By carefully selecting features to mask, we can preserve more data utility with equivalent practical privacy protection and relaxed settings. The implementation details are shared online at the provided GitHub URL \url{https://anonymous.4open.science/r/On-sensitive-features-and-empirical-epsilon-lower-bounds-BF67/}.

Paper Structure

This paper contains 17 sections, 7 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: $ASR^M$ variation across different datasets and modified versions of the same dataset. All models are trained using DP-SGD with a theoretical $\epsilon=8$. Abbreviations: M (MNIST), EM (EMNIST), FM (Fashion-MNIST), C10 (CIFAR-10), C100 (CIFAR-100), IN (ImageNet-1k subset), R (Randomly generated dataset).
  • Figure 2: An example of feature utility sensitivity and privacy sensitivity
  • Figure 3: Pipeline of feature masking
  • Figure 4: Figure (a) shows the decreasing trends of $ASR^M$ when top-k important features are masked for CIFAR-10 and ImageNet-1K. Figure (b) demonstrates that with the increasing percentage of randomized labels (representing data quality reduction), $ASR^M$ decreases for MNIST.
  • Figure 5: The relationship between $ASR^{M, \epsilon}$ and theoretical $\epsilon$ over utility models for optimized masked, random masked, and original dataset. The $\alpha$ parameter setting for optimized feature masking: $\alpha$ at 0.1 for JAFFE, 0.2 for both RaFD and TFEID, and 0.3 for 100-Driver as suggested later in Figure \ref{['fig:choice of alpha']}. For the random feature masking approach, we omit 30% of the features at random, i.e., the same number of masked features in the optimized feature masking setting.
  • ...and 5 more figures