Personalized Differential Privacy for Ridge Regression

Krishna Acharya; Franziska Boenisch; Rakshit Naidu; Juba Ziani

Personalized Differential Privacy for Ridge Regression

Krishna Acharya, Franziska Boenisch, Rakshit Naidu, Juba Ziani

TL;DR

This work addresses the limitation of uniform privacy budgets in differential privacy by introducing Personalized-DP Output Perturbation (PDP-OP) for ridge regression, enabling per-data-point privacy levels via weights and a controlled noise mechanism. It provides formal privacy proofs and new accuracy guarantees tailored to personalized DP, situating the method within private ERM and extending prior empirical work with theoretical assurances. Empirically, PDP-OP significantly improves the privacy-utility trade-off over standard DP and over Jorgensen’s personalized approach on both synthetic and real data, with lower loss and reduced variability. The approach enables finer-grained privacy control with practical impact for privacy-sensitive ML tasks.

Abstract

The increased application of machine learning (ML) in sensitive domains requires protecting the training data through privacy frameworks, such as differential privacy (DP). DP requires to specify a uniform privacy level $\varepsilon$ that expresses the maximum privacy loss that each data point in the entire dataset is willing to tolerate. Yet, in practice, different data points often have different privacy requirements. Having to set one uniform privacy level is usually too restrictive, often forcing a learner to guarantee the stringent privacy requirement, at a large cost to accuracy. To overcome this limitation, we introduce our novel Personalized-DP Output Perturbation method (PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels. We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model. This work is the first to provide such theoretical accuracy guarantees when it comes to personalized DP in machine learning, whereas previous work only provided empirical evaluations. We empirically evaluate PDP-OP on synthetic and real datasets and with diverse privacy distributions. We show that by enabling each data point to specify their own privacy requirement, we can significantly improve the privacy-accuracy trade-offs in DP. We also show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al. (2015).

Personalized Differential Privacy for Ridge Regression

TL;DR

Abstract

that expresses the maximum privacy loss that each data point in the entire dataset is willing to tolerate. Yet, in practice, different data points often have different privacy requirements. Having to set one uniform privacy level is usually too restrictive, often forcing a learner to guarantee the stringent privacy requirement, at a large cost to accuracy. To overcome this limitation, we introduce our novel Personalized-DP Output Perturbation method (PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels. We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model. This work is the first to provide such theoretical accuracy guarantees when it comes to personalized DP in machine learning, whereas previous work only provided empirical evaluations. We empirically evaluate PDP-OP on synthetic and real datasets and with diverse privacy distributions. We show that by enabling each data point to specify their own privacy requirement, we can significantly improve the privacy-accuracy trade-offs in DP. We also show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al. (2015).

Paper Structure (31 sections, 8 theorems, 32 equations, 18 figures, 14 tables, 1 algorithm)

This paper contains 31 sections, 8 theorems, 32 equations, 18 figures, 14 tables, 1 algorithm.

Introduction
Preliminaries and Related Work
Differential Privacy.
Personalized Privacy.
Private Empirical Risk Minimization.
Algorithms and Guarantees for Personalized Privacy in Ridge Regression
Our Setup.
Our Main Algorithm.
Privacy Guarantees
Accuracy Guarantees
Experiments
Choice of Privacy Budgets.
Synthetic Data Generation.
Real Dataset.
Improvements over standard Differential Privacy
...and 16 more sections

Key Result

Theorem 3.2

Fix privacy specifications $\varepsilon_1, \ldots, \varepsilon_n > 0$. Let $B(\lambda) = \min \left(\frac{1}{\sqrt{\lambda}}, \frac{\sqrt{d}}{\lambda}\right)$. Algorithm Alg:output_boundedlabel with parameters $w_i = \frac{\varepsilon_i}{\sum_{j=1}^n \varepsilon_j}$ for all $i$ and $\eta = \frac{\la

Figures (18)

Figure 1: Lower standard deviation compared to jorgensen on the synthetic dataset, while varying the regularization parameter $\lambda$, keeping $\varepsilon_c = 0.01, \varepsilon_m = 0.2, \varepsilon_l=1.0, f_c = 0.34, f_m = 0.43, f_l = 0.23$.
Figure 2: Lower standard deviation compared to jorgensen on the Medical cost dataset, while varying the regularization parameter $\lambda$, keeping $\varepsilon_c = 0.01, \varepsilon_m = 0.2,\varepsilon_l=1.0, f_c = 0.34, f_m = 0.43, f_l = 0.23$.
Figure 3: Lower loss compared to jorgensen on the synthetic dataset while varying $\varepsilon_c$ (privacy level of the conservative users) , keeping $\varepsilon_m = 0.5,~\varepsilon_l=1.0,~f_c = 0.54,~f_m = 0.37, f_l = 0.09$.
Figure 4: Lower loss compared to jorgensen on the Medical costs dataset while varying $\varepsilon_c$ (privacy level of the conservative users), keeping $\varepsilon_m = 0.5, \varepsilon_l=1.0, f_c = 0.54, f_m = 0.37, f_l = 0.09$.
Figure 5: Lower loss compared to jorgensen on the synthetic dataset while varying $\varepsilon_m$ (privacy level of pragmatists), keeping $\varepsilon_c = 0.01, \varepsilon_l=1.0, f_c = 0.54, f_m = 0.37, f_l = 0.09$.
...and 13 more figures

Theorems & Definitions (18)

Definition 2.1: $i$-neighboring
Definition 2.2: Personalized DP
Remark 3.1: How to sample $Z$
Theorem 3.2
Theorem 3.4
proof : Proof Sketch
Theorem 3.6: Accuracy of $\hat{\theta}$
proof
Lemma A.1
proof
...and 8 more

Personalized Differential Privacy for Ridge Regression

TL;DR

Abstract

Personalized Differential Privacy for Ridge Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (18)