Table of Contents
Fetching ...

Locally Private Sampling with Public Data

Behnoosh Zamanlooy, Mario Diaz, Shahab Asoodeh

TL;DR

This work fully characterize the minimax optimal mechanisms for general $f$-divergences provided that $p and $q$ are discrete distributions and demonstrates that this optimal mechanism is universal across all $f$-divergences.

Abstract

Local differential privacy (LDP) is increasingly employed in privacy-preserving machine learning to protect user data before sharing it with an untrusted aggregator. Most LDP methods assume that users possess only a single data record, which is a significant limitation since users often gather extensive datasets (e.g., images, text, time-series data) and frequently have access to public datasets. To address this limitation, we propose a locally private sampling framework that leverages both the private and public datasets of each user. Specifically, we assume each user has two distributions: $p$ and $q$ that represent their private dataset and the public dataset, respectively. The objective is to design a mechanism that generates a private sample approximating $p$ while simultaneously preserving $q$. We frame this objective as a minimax optimization problem using $f$-divergence as the utility measure. We fully characterize the minimax optimal mechanisms for general $f$-divergences provided that $p$ and $q$ are discrete distributions. Remarkably, we demonstrate that this optimal mechanism is universal across all $f$-divergences. Experiments validate the effectiveness of our minimax optimal sampler compared to the state-of-the-art locally private sampler.

Locally Private Sampling with Public Data

TL;DR

This work fully characterize the minimax optimal mechanisms for general -divergences provided that qf$-divergences.

Abstract

Local differential privacy (LDP) is increasingly employed in privacy-preserving machine learning to protect user data before sharing it with an untrusted aggregator. Most LDP methods assume that users possess only a single data record, which is a significant limitation since users often gather extensive datasets (e.g., images, text, time-series data) and frequently have access to public datasets. To address this limitation, we propose a locally private sampling framework that leverages both the private and public datasets of each user. Specifically, we assume each user has two distributions: and that represent their private dataset and the public dataset, respectively. The objective is to design a mechanism that generates a private sample approximating while simultaneously preserving . We frame this objective as a minimax optimization problem using -divergence as the utility measure. We fully characterize the minimax optimal mechanisms for general -divergences provided that and are discrete distributions. Remarkably, we demonstrate that this optimal mechanism is universal across all -divergences. Experiments validate the effectiveness of our minimax optimal sampler compared to the state-of-the-art locally private sampler.

Paper Structure

This paper contains 30 sections, 11 theorems, 71 equations, 12 figures, 1 algorithm.

Key Result

Theorem 1

Let $q$ be a finitely supported distribution and $q_{\min} \coloneqq \min_{x \in {\mathcal{X}}} q(x)$. Then,

Figures (12)

  • Figure 1: Visualization of a single recursion step in \ref{['alg']}.
  • Figure 2: Comparison of our private sampling approach with the relative mollifier sampling framework using a simulated public prior. The local distribution $p$ is constructed by fixing $p^1$, with the rest identical and summing to 1. As $p^1$ increases, $p$ approaches a Dirac distribution, where our method is expected to outperform. Average ${\mathsf {TV}}$-distance and ${\mathsf {KL}}$-divergence are reported over 10 runs for $n = 100$ and $\varepsilon = 8$.
  • Figure 3: Comparison of our private sampling method with the relative mollifier sampling framework for inferring the next website for an AdTech company to display ads on. For each user, we first compute the ${\mathsf {TV}}$-distance between their local distribution and the corresponding sampling distribution. The figure then reports the maximum ${\mathsf {TV}}$-distance observed within each Subcategory for $\varepsilon = 12$.
  • Figure 4: Comparison of our private sampling method with the relative mollifier sampling framework for inferring the genre of the next movie users are likely to watch. For each user, we first compute the ${\mathsf {TV}}$-distance between their local distribution and the corresponding sampling distribution. The figure then reports the maximum ${\mathsf {TV}}$-distance observed within each age group for $\varepsilon = 5$.
  • Figure 5: One iteration of the recursion of Algorithm 1.
  • ...and 7 more figures

Theorems & Definitions (25)

  • Definition 1: Mollifiers, husain2020local
  • Definition 2: Relative Mollifiers, husain2020local
  • Definition 3: Private Samplers husain2020local
  • Definition 4
  • Theorem 1: Optimal Utility for $f$-divergences
  • Corollary 1: Optimal Utility for ${\mathsf {TV}}$-distance
  • Proposition 1: Optimal Binary Mechanism
  • Lemma 1
  • Theorem 2: Optimal Mechanisms
  • Corollary 2
  • ...and 15 more