Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy

Wei Dong; Qiyao Luo; Giulia Fanti; Elaine Shi; Ke Yi

Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy

Wei Dong, Qiyao Luo, Giulia Fanti, Elaine Shi, Ke Yi

TL;DR

This paper shows how two seemingly sequential steps can be done simultaneously in one round using just 1+o(1) messages per user, while maintaining the instance-optimal error bound, and extends the technique to the high-dimensional sum estimation problem and sparse vector aggregation.

Abstract

Differentially private mechanisms achieving worst-case optimal error bounds (e.g., the classical Laplace mechanism) are well-studied in the literature. However, when typical data are far from the worst case, \emph{instance-specific} error bounds -- which depend on the largest value in the dataset -- are more meaningful. For example, consider the sum estimation problem, where each user has an integer $x_i$ from the domain $\{0,1,\dots,U\}$ and we wish to estimate $\sum_i x_i$. This has a worst-case optimal error of $O(U/\varepsilon)$, while recent work has shown that the clipping mechanism can achieve an instance-optimal error of $O(\max_i x_i \cdot \log\log U /\varepsilon)$. Under the shuffle model, known instance-optimal protocols are less communication-efficient. The clipping mechanism also works in the shuffle model, but requires two rounds: Round one finds the clipping threshold, and round two does the clipping and computes the noisy sum of the clipped data. In this paper, we show how these two seemingly sequential steps can be done simultaneously in one round using just $1+o(1)$ messages per user, while maintaining the instance-optimal error bound. We also extend our technique to the high-dimensional sum estimation problem and sparse vector aggregation (a.k.a. frequency estimation under user-level differential privacy).

Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy

TL;DR

Abstract

from the domain

and we wish to estimate

. This has a worst-case optimal error of

, while recent work has shown that the clipping mechanism can achieve an instance-optimal error of

. Under the shuffle model, known instance-optimal protocols are less communication-efficient. The clipping mechanism also works in the shuffle model, but requires two rounds: Round one finds the clipping threshold, and round two does the clipping and computes the noisy sum of the clipped data. In this paper, we show how these two seemingly sequential steps can be done simultaneously in one round using just

messages per user, while maintaining the instance-optimal error bound. We also extend our technique to the high-dimensional sum estimation problem and sparse vector aggregation (a.k.a. frequency estimation under user-level differential privacy).

Paper Structure (26 sections, 9 theorems, 32 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 26 sections, 9 theorems, 32 equations, 5 figures, 5 tables, 2 algorithms.

Introduction
Our results
Contributions
Related Work
Preliminaries
Differential Privacy
Sum Estimation in Central-DP
Sum Estimation in Shuffle-DP
A Straw-man One-Round Protocol
Domain Compression
Try All Possible $\tau$
Our Protocol
Domain Partitioning
Finding $\tau$ with No Extra Cost
High-Dimensional Sum Estimation
...and 11 more sections

Key Result

Lemma 3.1

If $\mathcal{M}$ satisfies $(\varepsilon,\delta)$-DP and $\mathcal{M}'$ is any randomized mechanism, then $\mathcal{M}'(\mathcal{M}(D))$ satisfies $(\varepsilon,\delta)$-DP.

Figures (5)

Figure 1: An illustration of our protocol for sum estimation. $U=2^{10}$, $\varepsilon=1$, and $\beta = 0.1$.
Figure 2: An illustration of our protocol for high-dimensional sum estimation. $d=8$.
Figure 3: Error levels and average messages per user for the sum estimation mechanisms under shuffle-DP with different data size $n$. $\mathrm{CentralDP}$ represents the state-of-the-art algorithm for sum estimation under central-DP.
Figure 4: Error levels of the mechanisms for sum estimation under shuffle-DP with different value domain $U$.
Figure 5: Error levels of the mechanisms for sum estimation under shuffle-DP with data drawn from a Gaussian distribution with different $\sigma$.

Theorems & Definitions (13)

Definition 1: Differential privacy
Lemma 3.1: Post Processing dwork2014algorithmic
Lemma 3.2: Sequential Composition dwork2014algorithmic
Lemma 3.3: Parallel Composition mcsherry2009privacy
Lemma 3.4: Laplace Mechanism
Lemma 3.5
Theorem 5.1
proof
Lemma 6.1: ailon2009fast
Theorem 6.2
...and 3 more

Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy

TL;DR

Abstract

Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (13)