Provable Privacy with Non-Private Pre-Processing

Yaxi Hu; Amartya Sanyal; Bernhard Schölkopf

Provable Privacy with Non-Private Pre-Processing

Yaxi Hu, Amartya Sanyal, Bernhard Schölkopf

TL;DR

This work proposes a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms, and establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms.

Abstract

When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms. In addition to the generic framework, we provide explicit overall privacy guarantees for multiple data-dependent pre-processing algorithms, such as data imputation, quantization, deduplication and PCA, when used in combination with several DP algorithms. Notably, this framework is also simple to implement, allowing direct integration into existing DP pipelines.

Provable Privacy with Non-Private Pre-Processing

TL;DR

Abstract

Paper Structure (57 sections, 43 theorems, 94 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 57 sections, 43 theorems, 94 equations, 4 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Preliminaries in Differential Privacy
Rényi Differential Privacy
Private mechanisms
Output perturbation
Random sampling
Gradient perturbation
Main Results
Smooth RDP
Properties of SRDP
SRDP parameters for common private mechanisms
Privacy of Pre-Processed DP Pipelines
Comparison with the group privacy or DDP analysis
Privacy Guarantees of Common Pre-Processing Algorithms
...and 42 more sections

Key Result

Lemma 1

Let ${\mathcal{L}}$ be any dataset collection. Then, the following holds.

Figures (4)

Figure 1: Illustration of the privacy analysis: For two neighboring datasets $S_1, S_2$, a pre-processing algorithm $\pi$ yields the pre-processed datasets $\pi_1(S_1)$ and $\pi_2(S_2)$ respectively. A synthetic dataset $\tilde{S}$ is constructed by combining the pre-processed datasets, ensuring that $\tilde{S}$ and $\pi_2(S_2)$ are neighboring datasets and that $\tilde{S}$ and $\pi_1(S_1)$ have bounded $L_{12}$ distance.
Figure 2: Visualization of the overall privacy of pre-processed Gaussian mechanism analysed with group privacy and our bound from Theorem 2. Here, $\eta$ is the distance threshold of the quantization algorithm, $n$ is the size of the possible datasets, and $\delta_{\min}$ is the minimum gap between the $k^{th}$ and the $k+1^{th}$ eigenvalue of all possible datasets.
Figure 3: Comparison of excess empirical loss of private logistic regression: for each level of overall privacy $\varepsilon$, pre-processed DP pipeline consistently outperforms other methods.
Figure 4: Illustration of the privacy analysis

Theorems & Definitions (74)

Definition 1: Rényi Divergence
Definition 2: $(\alpha, \varepsilon(\alpha))$-RDP
Definition 3: $(\alpha, \varepsilon(\alpha, \tau))$-smooth RDP
Lemma 1
Theorem 1: Informal
Definition 4
Theorem 2
proof : Proof sketch
Proposition 2
Corollary 3
...and 64 more

Provable Privacy with Non-Private Pre-Processing

TL;DR

Abstract

Provable Privacy with Non-Private Pre-Processing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (74)