Table of Contents
Fetching ...

Statistic Maximal Leakage

Shuaiqi Wang, Zinan Lin, Giulia Fanti

TL;DR

It is shown theoretically and empirically that the quantization mechanism achieves better privacy-utility tradeoffs in the settings the authors study, and how to efficiently compute it in the special case of deterministic data release mechanisms.

Abstract

We introduce a privacy measure called statistic maximal leakage that quantifies how much a privacy mechanism leaks about a specific secret, relative to the adversary's prior information about that secret. Statistic maximal leakage is an extension of the well-known maximal leakage. Unlike maximal leakage, which protects an arbitrary, unknown secret, statistic maximal leakage protects a single, known secret. We show that statistic maximal leakage satisfies composition and post-processing properties. Additionally, we show how to efficiently compute it in the special case of deterministic data release mechanisms. We analyze two important mechanisms under statistic maximal leakage: the quantization mechanism and randomized response. We show theoretically and empirically that the quantization mechanism achieves better privacy-utility tradeoffs in the settings we study.

Statistic Maximal Leakage

TL;DR

It is shown theoretically and empirically that the quantization mechanism achieves better privacy-utility tradeoffs in the settings the authors study, and how to efficiently compute it in the special case of deterministic data release mechanisms.

Abstract

We introduce a privacy measure called statistic maximal leakage that quantifies how much a privacy mechanism leaks about a specific secret, relative to the adversary's prior information about that secret. Statistic maximal leakage is an extension of the well-known maximal leakage. Unlike maximal leakage, which protects an arbitrary, unknown secret, statistic maximal leakage protects a single, known secret. We show that statistic maximal leakage satisfies composition and post-processing properties. Additionally, we show how to efficiently compute it in the special case of deterministic data release mechanisms. We analyze two important mechanisms under statistic maximal leakage: the quantization mechanism and randomized response. We show theoretically and empirically that the quantization mechanism achieves better privacy-utility tradeoffs in the settings we study.

Paper Structure

This paper contains 66 sections, 22 theorems, 112 equations, 9 figures, 1 algorithm.

Key Result

Lemma 3.1

The distortion measure $\Delta_{\mathcal{M}}$ can be rewritten as $\Delta_{\mathcal{M}} =\sup_{\theta} \mathbb{E}_{\Theta'=\mathcal{M}\left( \theta \right)}[D_{\text{TV}}\left( \nu_{X_{\theta}}\|\nu_{Y_{\theta'}} \right)].$

Figures (9)

  • Figure 1: Given a mechanism $\mathcal{M}=\mathbb{P}_{\Theta'|\Theta}$, the left subfigure shows a policy matrix. For each column $j$, the red outlined region indicates rows of parameters with secret $g$ maximizing $\mathbb{P}_{\Theta'|\Theta}\left( \theta'_j|\theta_g \right)$. The blue cell lies in the row of $\theta_g$. When the mechanism $\mathcal{M}$ is deterministic, SML calculation can be converted to a min-cost flow problem (right). The constructed directed graph contains three columns of nodes (representing $G, \Theta, \Theta'$ respectively) between the source and sink nodes. The capacity of all edges are $1$, and only the edges between nodes in $\Theta$ and $\Theta'$ columns have non-zero cost ($-\mathbb{P}_{\Theta'|\Theta}\left( \theta'_k|\theta_j \right)$ between $\theta_j$ and $\theta'_k$). Edges are annotated as: Capacity (Cost).
  • Figure 2: Relation between $\textcolor{red}{\mathbf{\Gamma}}$, $\mathbf{\Gamma}^*$, $\textcolor{brown}{\hat{\mathbf{\Gamma}}^*}$, $\textcolor{blue}{\hat{\mathbf{\Gamma}}^*_0}$ and $\textcolor{teal}{\hat{\mathbf{\Gamma}}^*_1}$. $\mathbf{\Gamma}^*$ is the set depicted in black, containing all feasible attribute combinations. $\textcolor{red}{\mathbf{\Gamma}}$ in red is the set of attribute combinations existing in the dataset $\mathbf{D}$, and $\textcolor{red}{\mathbf{\Gamma}}\subseteq \mathbf{\Gamma}^*$. $\textcolor{brown}{\hat{\mathbf{\Gamma}}^*}$ in dark yellow is the data holder's estimate of $\mathbf{\Gamma}^*$; it may contain both feasible attribute combinations, as shown in the blue sub-region with notation $\textcolor{blue}{\hat{\mathbf{\Gamma}}^*_0}$, and infeasible attribute combinations, as shown in the green sub-region with notation $\textcolor{teal}{\hat{\mathbf{\Gamma}}^*_1}$.
  • Figure 3: For a tabular dataset, we can generate a categorical distribution with each category corresponding to the fraction of records with a given combination of attribute values. $\theta$ is the parameter of the constructed categorical distribution.
  • Figure 4: Under attribute privacy, the privacy level of RR does not vary with the number of possible secrets, indicating its inability to capture the influence of the secret value space on the information leakage (\ref{['exp:secret_num']}); the distortion of RR always achieve its trivial upper bound 1 until the attribute privacy value is larger than 44,320, indicating that attribute privacy is a much more conservative metric (\ref{['exp:ap_distortion']}).
  • Figure 5: Privacy-utility trade-offs of RR, QM, and MaxL when the secret is the fraction of an arbitrary category.
  • ...and 4 more figures

Theorems & Definitions (38)

  • Lemma 3.1
  • Proposition 4.1
  • Proposition 4.2: SML computation, deterministic mechanism
  • Theorem 4.1: Hardness of SML computation
  • Theorem 4.2: Adaptive Composition
  • Theorem 4.3: Post-Processing
  • Theorem 5.1
  • Definition 1: Robustness to support mismatch
  • Proposition 5.1
  • Lemma A.1
  • ...and 28 more