Table of Contents
Fetching ...

Robust Privatization with Multiple Tasks and the Optimal Privacy-Utility Tradeoff

Ta-Yuan Liu, I-Hsiang Wang

TL;DR

The paper tackles robust data privatization when multiple target tasks are possible, formulating an information-theoretic rate-leakage-distortion problem with a prior-dependent privacy metric. It proves that a parallel privatization across independent data components is optimal, reducing the complex multi-task problem to a collection of parallel privacy-funnel subproblems and a linear program to determine component weightings. Under log-loss utility, the problem becomes a tractable LP, with a leakage-free threshold τ_i=H(X_i)−H(S_i) that governs the zero-leakage regime and a linear leakage-utility relationship beyond it. The work also provides a DP-S extension and characterizes a sufficient released rate for achieving the minimum leakage, supported by numerical results showing robustness to task non-specificity. These results offer a principled, scalable approach to robust privatization in multi-task settings with practical implications for privacy-utility tradeoffs in real-world data releases.

Abstract

In this work, fundamental limits and optimal mechanisms of privacy-preserving data release that aims to minimize the privacy leakage under utility constraints of a set of multiple tasks are investigated. While the private feature to be protected is typically determined and known by the sanitizer, the target task is usually unknown. To address the lack of information on the specific task, utility constraints laid on a set of multiple possible tasks are considered. The mechanism protects the specific privacy feature of the to-be-released data while satisfying utility constraints of all possible tasks in the set. First, the single-letter characterization of the rate-leakage-distortion region is derived, where the utility of each task is measured by a distortion function. It turns out that the minimum privacy leakage problem with log-loss distortion constraints and the unconstrained released rate is a non-convex optimization problem. Second, focusing on the case where the raw data consists of multiple independent components, we show that the above non-convex optimization problem can be decomposed into multiple parallel privacy funnel (PF) problems with different weightings. We explicitly derive the optimal solution to each PF problem when the private feature is a component-wise deterministic function of a data vector. The solution is characterized by a leakage-free threshold: when the utility constraint is below the threshold, the minimum leakage is zero; once the required utility level is above the threshold, the privacy leakage increases linearly. Finally, we show that the optimal weighting of each privacy funnel problem can be found by solving a linear program (LP). A sufficient released rate to achieve the minimum leakage is also derived. Numerical results are shown to illustrate the robustness of our approach against the task non-specificity.

Robust Privatization with Multiple Tasks and the Optimal Privacy-Utility Tradeoff

TL;DR

The paper tackles robust data privatization when multiple target tasks are possible, formulating an information-theoretic rate-leakage-distortion problem with a prior-dependent privacy metric. It proves that a parallel privatization across independent data components is optimal, reducing the complex multi-task problem to a collection of parallel privacy-funnel subproblems and a linear program to determine component weightings. Under log-loss utility, the problem becomes a tractable LP, with a leakage-free threshold τ_i=H(X_i)−H(S_i) that governs the zero-leakage regime and a linear leakage-utility relationship beyond it. The work also provides a DP-S extension and characterizes a sufficient released rate for achieving the minimum leakage, supported by numerical results showing robustness to task non-specificity. These results offer a principled, scalable approach to robust privatization in multi-task settings with practical implications for privacy-utility tradeoffs in real-world data releases.

Abstract

In this work, fundamental limits and optimal mechanisms of privacy-preserving data release that aims to minimize the privacy leakage under utility constraints of a set of multiple tasks are investigated. While the private feature to be protected is typically determined and known by the sanitizer, the target task is usually unknown. To address the lack of information on the specific task, utility constraints laid on a set of multiple possible tasks are considered. The mechanism protects the specific privacy feature of the to-be-released data while satisfying utility constraints of all possible tasks in the set. First, the single-letter characterization of the rate-leakage-distortion region is derived, where the utility of each task is measured by a distortion function. It turns out that the minimum privacy leakage problem with log-loss distortion constraints and the unconstrained released rate is a non-convex optimization problem. Second, focusing on the case where the raw data consists of multiple independent components, we show that the above non-convex optimization problem can be decomposed into multiple parallel privacy funnel (PF) problems with different weightings. We explicitly derive the optimal solution to each PF problem when the private feature is a component-wise deterministic function of a data vector. The solution is characterized by a leakage-free threshold: when the utility constraint is below the threshold, the minimum leakage is zero; once the required utility level is above the threshold, the privacy leakage increases linearly. Finally, we show that the optimal weighting of each privacy funnel problem can be found by solving a linear program (LP). A sufficient released rate to achieve the minimum leakage is also derived. Numerical results are shown to illustrate the robustness of our approach against the task non-specificity.

Paper Structure

This paper contains 28 sections, 9 theorems, 72 equations, 5 figures.

Key Result

Lemma 1

The optimal rate-leakage-distortion region $\mathcal{R}$ is the collection of $(R,L,D_1,...D_K) \in \mathbb{R}^{K+2}_+$ satisfying for some $p_{Y,\hat{C}_1,...,\hat{C}_K|X}(y,\hat{c}_1,...,\hat{c}_k|x)$ where form a Markov chain, and $|\mathcal{Y}|\leq (|\mathcal{X}|+1) \cdot\prod_{k=1}^K |\hat{\mathcal{C}}_k|$.

Figures (5)

  • Figure 1: The information theoretic multi-letter formulation of the privacy-preserving data release problem with $K$ possible tasks $\{C_1,...,C_K\}$ and a single private feature $S$.
  • Figure 2: An example of a data set $X^n$. Each row represents a patient's entry, and different patients' entries are assumed to be i.i.d.. It is also assumed that each patient's entry consists of $N$ independent attributes. In short, the $i$-th row is the $i$-th patient's entry, and it is a vector $X(i)=[X_1(i),...,X_N(i)]$, $i=1,2,...,n$. The private feature $S$ in this example could be the disease(s) of the patient or the range of incomes (both are deterministic functions of $X$).
  • Figure 3: The privacy and utility tradeoff for the task $C_1$ under the privatization based on different possible sets.
  • Figure 4: The effect of possible set of task on the utility of each task with fixed privacy leakage.
  • Figure 5: For given privacy leakage constraint, e.g., $10$, the utility of all the tasks containing $3$ components of $X$ with different choices of $\mathcal{T}$.

Theorems & Definitions (17)

  • Definition 1
  • Definition 2
  • Definition 3: Minimum privacy leakage
  • Lemma 1
  • Remark 1
  • Remark 2
  • Theorem 1
  • Remark 3
  • Corollary 1
  • Theorem 2
  • ...and 7 more