Table of Contents
Fetching ...

Multi-user Pufferfish Privacy

Ni Ding, Songpei Lu, Wenjing Yang, Zijian Zhang

TL;DR

The paper tackles achieving individual indistinguishability in multi-user aggregated queries under the pufferfish privacy framework by calibrating Laplace noise using the Kantorovich (Wasserstein-1) mechanism. It derives explicit sufficient conditions for four secret-pair sets—covering value changes, presence/absence, and distribution shifts—showing that privacy guarantees can depend primarily on the statistics of the individual user involved, with attendance probability largely inconsequential. For binary data, the authors provide relaxations that reduce noise and improve data utility, and they employ numerical methods (Brent’s method) to approximate optimal noise scales under relaxed conditions. The framework is validated conceptually with experiments indicating practical privacy protection when adding, removing, or modifying a class of users, offering a tractable approach to privacy-utility trade-offs in tabular data releases.

Abstract

This paper studies how to achieve individual indistinguishability by pufferfish privacy in aggregated query to a multi-user system. It is assumed that each user reports realization of a random variable. We study how to calibrate Laplace noise, added to the query answer, to attain pufferfish privacy when user changes his/her reported data value, leaves the system and is replaced by another use with different randomness. Sufficient conditions are derived for all scenarios for attaining statistical indistinguishability on four sets of secret pairs. They are derived using the existing Kantorovich method (Wasserstain metric of order $1$). These results can be applied to attain indistinguishability when a certain class of users is added or removed from a tabular data. It is revealed that attaining indifference in individual's data is conditioned on the statistics of this user only. For binary (Bernoulli distributed) random variables, the derived sufficient conditions can be further relaxed to reduce the noise and improve data utility.

Multi-user Pufferfish Privacy

TL;DR

The paper tackles achieving individual indistinguishability in multi-user aggregated queries under the pufferfish privacy framework by calibrating Laplace noise using the Kantorovich (Wasserstein-1) mechanism. It derives explicit sufficient conditions for four secret-pair sets—covering value changes, presence/absence, and distribution shifts—showing that privacy guarantees can depend primarily on the statistics of the individual user involved, with attendance probability largely inconsequential. For binary data, the authors provide relaxations that reduce noise and improve data utility, and they employ numerical methods (Brent’s method) to approximate optimal noise scales under relaxed conditions. The framework is validated conceptually with experiments indicating practical privacy protection when adding, removing, or modifying a class of users, offering a tractable approach to privacy-utility trade-offs in tabular data releases.

Abstract

This paper studies how to achieve individual indistinguishability by pufferfish privacy in aggregated query to a multi-user system. It is assumed that each user reports realization of a random variable. We study how to calibrate Laplace noise, added to the query answer, to attain pufferfish privacy when user changes his/her reported data value, leaves the system and is replaced by another use with different randomness. Sufficient conditions are derived for all scenarios for attaining statistical indistinguishability on four sets of secret pairs. They are derived using the existing Kantorovich method (Wasserstain metric of order ). These results can be applied to attain indistinguishability when a certain class of users is added or removed from a tabular data. It is revealed that attaining indifference in individual's data is conditioned on the statistics of this user only. For binary (Bernoulli distributed) random variables, the derived sufficient conditions can be further relaxed to reduce the noise and improve data utility.

Paper Structure

This paper contains 21 sections, 9 theorems, 34 equations, 9 figures, 6 tables.

Key Result

Proposition 1

Adding Laplace noise $N_\theta$ with attains $(\epsilon, \mathds{S})$-pufferfish privacy in $Y$.

Figures (9)

  • Figure 1: Assume the first three users in Table \ref{['tab:ExpSetting']} and consider $\mathds{S}_{a,b} = \{(s_{a_4}, s_{b_4})\}$ for attaining distinguishability between user $4$ reporting $a_4 = 5$ and $b_4 = 3$. In this case, the Kantorovich optimal transport plan $\pi^*$ is shown in figure, where $\sup_{(x,x') \in \text{supp}(\pi^*)}=2$. By Proposition \ref{['prop:Kantorovich']}, adding Laplace noise with scale parameter $\theta = \frac{2}{\epsilon}$ attains $(\epsilon,\mathds{S}_{a,b})$-pufferfish privacy.
  • Figure 2: Assume the first three users in Table \ref{['tab:ExpSetting']} and consider $\mathds{S}_{a,\perp} = \{(s_{a_4}, s_{\perp_4})\}$ for attaining distinguishability between user $4$ existing and reporting $a_4 = 5$ and the absence of user $4$. The resulting Kantorovich optimal transport plan $\pi^*$ is shown in figure, where $\sup_{(x,x') \in \text{supp}(\pi^*)}=5$. By Proposition \ref{['prop:Kantorovich']}, adding Laplace noise with scale parameter $\theta = \frac{5}{\epsilon}$ attains $(\epsilon,\mathds{S}_{a,\perp})$-pufferfish privacy, which is equivalent to $(\epsilon,\mathds{S}_{a,0})$-pufferfish privacy (Remark \ref{['rem:SabSaperp']}).
  • Figure 3: For the first three users in Table \ref{['tab:ExpSetting']}, consider adding a $4$th user such that $D_4 \sim P_4(\cdot)$ in Table \ref{['tab:ExpSettingExtraTwo']}. For secret pair $\mathds{S}_{P,\perp} = \{(s_{P_4}, s_{\perp_4})\}$, the resulting Kantorovich optimal transport plan $\pi^*$ is shown in figure, where $\sup_{(x,x') \in \text{supp}(\pi^*)}=5$. By Proposition \ref{['prop:Kantorovich']}, adding Laplace noise with scale parameter $\theta = \frac{5}{\epsilon}$ attains $(\epsilon,\mathds{S}_{a,\perp})$-pufferfish privacy.
  • Figure 4: For the experiment in Figure \ref{['fig:SPperp']}, we use Brent method to approximate a $\theta$ that satisfies $\mathds{E}_{D_i \sim P_{D_i}} [e^{\frac{|D_i|}{\theta}} ] \leq e^{\epsilon}$ by \ref{['eq:SuffCondSPperpRelax']}. Compared to \ref{['eq:SuffCondSPperp']}, the value of $\theta$ is reduced, which indicates a noise reduction, i.e., an improvement in data utility, for attaining pufferfish privacy.
  • Figure 5: Two values of $\theta$ produced by \ref{['eq:SuffCondSPperpRelax']} and \ref{['eq:SuffCondSPperp']} in Theorem \ref{['theo:SuffCondSPperp']} for achieving $\epsilon$-pufferfish privacy on $\mathds{S}_{P,\perp} = \{(P_i,\perp_i)\}$ on three data sets in the UCI machine learning repository, where $P_i (\cdot) = \Pr(\texttt{education} = \cdot | \texttt{race} = \texttt{'white'})$, $P_i(\cdot) = \Pr(\texttt{romatic} = \cdot | \texttt{higher} = \texttt{'yes'})$ and $P_i(\cdot) = \Pr(\texttt{marital} = \cdot | \texttt{loan} = \texttt{'yes'})$ for dataset adult, student performance and bank marketing, respectively.
  • ...and 4 more figures

Theorems & Definitions (12)

  • Proposition 1: $W_1$ (Kantorovich) mechanism Ding2022AISTATS
  • Theorem 1
  • Theorem 2
  • Remark 1
  • Theorem 3
  • Remark 2
  • Theorem 4
  • Lemma 1
  • Lemma 2: Relaxed sufficient condition
  • Lemma 3
  • ...and 2 more