Table of Contents
Fetching ...

Provable Privacy with Non-Private Pre-Processing

Yaxi Hu, Amartya Sanyal, Bernhard Schölkopf

TL;DR

This work proposes a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms, and establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms.

Abstract

When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms. In addition to the generic framework, we provide explicit overall privacy guarantees for multiple data-dependent pre-processing algorithms, such as data imputation, quantization, deduplication and PCA, when used in combination with several DP algorithms. Notably, this framework is also simple to implement, allowing direct integration into existing DP pipelines.

Provable Privacy with Non-Private Pre-Processing

TL;DR

This work proposes a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms, and establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms.

Abstract

When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms. In addition to the generic framework, we provide explicit overall privacy guarantees for multiple data-dependent pre-processing algorithms, such as data imputation, quantization, deduplication and PCA, when used in combination with several DP algorithms. Notably, this framework is also simple to implement, allowing direct integration into existing DP pipelines.
Paper Structure (57 sections, 43 theorems, 94 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 57 sections, 43 theorems, 94 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

Let ${\mathcal{L}}$ be any dataset collection. Then, the following holds.

Figures (4)

  • Figure 1: Illustration of the privacy analysis: For two neighboring datasets $S_1, S_2$, a pre-processing algorithm $\pi$ yields the pre-processed datasets $\pi_1(S_1)$ and $\pi_2(S_2)$ respectively. A synthetic dataset $\tilde{S}$ is constructed by combining the pre-processed datasets, ensuring that $\tilde{S}$ and $\pi_2(S_2)$ are neighboring datasets and that $\tilde{S}$ and $\pi_1(S_1)$ have bounded $L_{12}$ distance.
  • Figure 2: Visualization of the overall privacy of pre-processed Gaussian mechanism analysed with group privacy and our bound from Theorem 2. Here, $\eta$ is the distance threshold of the quantization algorithm, $n$ is the size of the possible datasets, and $\delta_{\min}$ is the minimum gap between the $k^{th}$ and the $k+1^{th}$ eigenvalue of all possible datasets.
  • Figure 3: Comparison of excess empirical loss of private logistic regression: for each level of overall privacy $\varepsilon$, pre-processed DP pipeline consistently outperforms other methods.
  • Figure 4: Illustration of the privacy analysis

Theorems & Definitions (74)

  • Definition 1: Rényi Divergence
  • Definition 2: $(\alpha, \varepsilon(\alpha))$-RDP
  • Definition 3: $(\alpha, \varepsilon(\alpha, \tau))$-smooth RDP
  • Lemma 1
  • Theorem 1: Informal
  • Definition 4
  • Theorem 2
  • proof : Proof sketch
  • Proposition 2
  • Corollary 3
  • ...and 64 more