Table of Contents
Fetching ...

Guarding Multiple Secrets: Enhanced Summary Statistic Privacy for Data Sharing

Shuaiqi Wang, Rongzhe Wei, Mohsen Ghassemi, Eleonora Kreacic, Vamsi K. Potluru

TL;DR

This work extends summary statistics privacy to multi-secret settings in data sharing by defining interpretable privacy metrics (union, intersection, group) and a distortion metric based on Wasserstein-2 distance. It derives general lower bounds linking distributional distance and secret gaps, analyzes multiple privacy notions, and introduces a practical quantization-based release mechanism with near-optimal privacy-distortion performance. A case study on multivariate Gaussian distributions demonstrates how the bounds specialize (e.g., $\gamma^{\text{union}}=\sqrt{d}/2$) and guides mechanism design. Empirical evaluation on real data (Wikipedia Web Traffic) shows the mechanism achieves favorable privacy-utility tradeoffs compared to DP, AP, and DistP, validating the framework's applicability to real-world data sharing scenarios.

Abstract

Data sharing enables critical advances in many research areas and business applications, but it may lead to inadvertent disclosure of sensitive summary statistics (e.g., means or quantiles). Existing literature only focuses on protecting a single confidential quantity, while in practice, data sharing involves multiple sensitive statistics. We propose a novel framework to define, analyze, and protect multi-secret summary statistics privacy in data sharing. Specifically, we measure the privacy risk of any data release mechanism by the worst-case probability of an attacker successfully inferring summary statistic secrets. Given an attacker's objective spanning from inferring a subset to the entirety of summary statistic secrets, we systematically design and analyze tailored privacy metrics. Defining the distortion as the worst-case distance between the original and released data distribution, we analyze the tradeoff between privacy and distortion. Our contribution also includes designing and analyzing data release mechanisms tailored for different data distributions and secret types. Evaluations on real-world data demonstrate the effectiveness of our mechanisms in practical applications.

Guarding Multiple Secrets: Enhanced Summary Statistic Privacy for Data Sharing

TL;DR

This work extends summary statistics privacy to multi-secret settings in data sharing by defining interpretable privacy metrics (union, intersection, group) and a distortion metric based on Wasserstein-2 distance. It derives general lower bounds linking distributional distance and secret gaps, analyzes multiple privacy notions, and introduces a practical quantization-based release mechanism with near-optimal privacy-distortion performance. A case study on multivariate Gaussian distributions demonstrates how the bounds specialize (e.g., ) and guides mechanism design. Empirical evaluation on real data (Wikipedia Web Traffic) shows the mechanism achieves favorable privacy-utility tradeoffs compared to DP, AP, and DistP, validating the framework's applicability to real-world data sharing scenarios.

Abstract

Data sharing enables critical advances in many research areas and business applications, but it may lead to inadvertent disclosure of sensitive summary statistics (e.g., means or quantiles). Existing literature only focuses on protecting a single confidential quantity, while in practice, data sharing involves multiple sensitive statistics. We propose a novel framework to define, analyze, and protect multi-secret summary statistics privacy in data sharing. Specifically, we measure the privacy risk of any data release mechanism by the worst-case probability of an attacker successfully inferring summary statistic secrets. Given an attacker's objective spanning from inferring a subset to the entirety of summary statistic secrets, we systematically design and analyze tailored privacy metrics. Defining the distortion as the worst-case distance between the original and released data distribution, we analyze the tradeoff between privacy and distortion. Our contribution also includes designing and analyzing data release mechanisms tailored for different data distributions and secret types. Evaluations on real-world data demonstrate the effectiveness of our mechanisms in practical applications.
Paper Structure (52 sections, 24 theorems, 163 equations, 5 figures, 3 tables, 3 algorithms)

This paper contains 52 sections, 24 theorems, 163 equations, 5 figures, 3 tables, 3 algorithms.

Key Result

Theorem 4.1

Let $D\left( X_{\theta_1}, X_{\theta_2} \right) = \frac{1}{2} \mathfrak{D}\left( \omega_{X_{\theta_1}}\|\omega_{X_{\theta_2}} \right)$. Further, let $R^{\text{union}}({X_{\theta_1}},{X_{\theta_2}}) = \prod_{i\in [d]}\lvert g_{i}{({{\theta_1}})-g_{i}{({{\theta_2}})}}\rvert^{1/d}$ and Then, for tolerance ranges $\{\epsilon_i\}$ as defined in eq:privacy and for any mechanism $\mathcal{M}_g$ subject

Figures (5)

  • Figure 1: Attacker construction for proof of \ref{['thm:trade_off_general']} under the $2$-secret case. The true value of secret vector lies in the intersection of the highlighted bands.
  • Figure 2: Privacy (lower is better) and distortion of \ref{['mech:dGaussian_diagnol']} under WWT with different privacy metrics. For each metric, the solid line with the same color represents the theoretical lower bound.
  • Figure 3: Privacy and distortion (lower values are better) of DP, AP, DistP, and ours under different privacy metrics. The solid line represents the theoretical lower bound of achievable region.
  • Figure 4: Privacy (lower is better) and distortion of \ref{['mech:dGaussian_diagnol']} under Wikipedia Web Traffic Dataset with different privacy metrics. For each privacy formulation, the soild line with the same color represents the theoretical lower bound of achievable region.
  • Figure 5: Privacy and distortion (lower bottom is better) of DP, AP, DistP, and ours under $l_1$ norm privacy. The solid line represents the theoretical lower bound of achievable region.

Theorems & Definitions (45)

  • Theorem 4.1: Lower bound of privacy-distortion tradeoff
  • Theorem 5.1: Lower bound of privacy-distortion tradeoff for intersection privacy
  • Proposition 5.2
  • Theorem 5.3: Lower bound of privacy-distortion tradeoff for group secrets privacy
  • Proposition 6.1
  • Proposition 6.2: Mechanism privacy-distortion tradeoff
  • proof
  • proof
  • proof
  • proof
  • ...and 35 more