Table of Contents
Fetching ...

Augmented Shuffle Protocols for Accurate and Robust Frequency Estimation under Differential Privacy

Takao Murakami, Yuichi Sei, Reo Eriguchi

TL;DR

A generalized framework for local-noise-free protocols in which users send (encrypted) input data to the shuffler without adding noise is proposed and is robust to local data poisoning attacks if a simpler mechanism that performs the same process on binary input data provides DP.

Abstract

The shuffle model of DP (Differential Privacy) provides high utility by introducing a shuffler that randomly shuffles noisy data sent from users. However, recent studies show that existing shuffle protocols suffer from the following two major drawbacks. First, they are vulnerable to local data poisoning attacks, which manipulate the statistics about input data by sending crafted data, especially when the privacy budget epsilon is small. Second, the actual value of epsilon is increased by collusion attacks by the data collector and users. In this paper, we address these two issues by thoroughly exploring the potential of the augmented shuffle model, which allows the shuffler to perform additional operations, such as random sampling and dummy data addition. Specifically, we propose a generalized framework for local-noise-free protocols in which users send (encrypted) input data to the shuffler without adding noise. We show that this generalized protocol provides DP and is robust to the above two attacks if a simpler mechanism that performs the same process on binary input data provides DP. Based on this framework, we propose three concrete protocols providing DP and robustness against the two attacks. Our first protocol generates the number of dummy values for each item from a binomial distribution and provides higher utility than several state-of-the-art existing shuffle protocols. Our second protocol significantly improves the utility of our first protocol by introducing a novel dummy-count distribution: asymmetric two-sided geometric distribution. Our third protocol is a special case of our second protocol and provides pure epsilon-DP. We show the effectiveness of our protocols through theoretical analysis and comprehensive experiments.

Augmented Shuffle Protocols for Accurate and Robust Frequency Estimation under Differential Privacy

TL;DR

A generalized framework for local-noise-free protocols in which users send (encrypted) input data to the shuffler without adding noise is proposed and is robust to local data poisoning attacks if a simpler mechanism that performs the same process on binary input data provides DP.

Abstract

The shuffle model of DP (Differential Privacy) provides high utility by introducing a shuffler that randomly shuffles noisy data sent from users. However, recent studies show that existing shuffle protocols suffer from the following two major drawbacks. First, they are vulnerable to local data poisoning attacks, which manipulate the statistics about input data by sending crafted data, especially when the privacy budget epsilon is small. Second, the actual value of epsilon is increased by collusion attacks by the data collector and users. In this paper, we address these two issues by thoroughly exploring the potential of the augmented shuffle model, which allows the shuffler to perform additional operations, such as random sampling and dummy data addition. Specifically, we propose a generalized framework for local-noise-free protocols in which users send (encrypted) input data to the shuffler without adding noise. We show that this generalized protocol provides DP and is robust to the above two attacks if a simpler mechanism that performs the same process on binary input data provides DP. Based on this framework, we propose three concrete protocols providing DP and robustness against the two attacks. Our first protocol generates the number of dummy values for each item from a binomial distribution and provides higher utility than several state-of-the-art existing shuffle protocols. Our second protocol significantly improves the utility of our first protocol by introducing a novel dummy-count distribution: asymmetric two-sided geometric distribution. Our third protocol is a special case of our second protocol and provides pure epsilon-DP. We show the effectiveness of our protocols through theoretical analysis and comprehensive experiments.

Paper Structure

This paper contains 41 sections, 13 theorems, 50 equations, 9 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Let $\varepsilon_L \in \mathbb{R}_{\ge0}$. Let $D = (x_1, \cdots, x_n) \in [d]^n$. Let $\mathcal{R}: [d] \rightarrow \mathcal{Y}$ be an obfuscation mechanism. Let $\mathcal{M}_S: [d]^n \rightarrow \mathcal{Y}^n$ be a pure shuffle algorithm that given a dataset $D$, outputs shuffled values $\mathcal{ if $\varepsilon_L \leq \log (\frac{n}{16 \log (2/\delta)})$ and $g(n,\delta) = \varepsilon_L$ other

Figures (9)

  • Figure 1: Overview of our local-noise-free protocol.
  • Figure 2: Three bounds on the number $M$ of trials in SBin-Shuffle ($\beta=1$). DKMMN08 and ASYKM18 are the bounds in Dwork_EUROCRYPT06 and Agarwal_NeurIPS18, respectively. We set $\varepsilon=1$, $\delta=10^{-12}$, and $d=10^2$ as default values.
  • Figure 3: The asymmetric two-sided geometric distribution $\textsf{AGeo}(\nu,q_l,q_r)$ and its variance $\sigma^2$ (upper bound in (\ref{['eq:SAGeo_MSE']})) when $\varepsilon=1$ and $\nu=10$.
  • Figure 4: MSE vs. $\varepsilon$ ($\delta = 10^{-12}$, $\beta = 1$).
  • Figure 7: Communication cost $C_{tot}$ (bits). We set $d=100$, $\varepsilon=1$, $n=10^4$, and $\delta = 10^{-12}$ as default values ($\beta=1$, 2048-bit RSA). The GRR, OUE, OLH, and RAPPOR have the same $C_{tot}$ because the size of their obfuscated data is $\leq d$ (resp. $2048$) bits before (resp. after) encryption.
  • ...and 4 more figures

Theorems & Definitions (18)

  • Definition 1: $\Omega$-neighboring databases Beimel_CRYPTO08
  • Definition 2: $(\varepsilon,\delta)$-DP
  • Definition 3: $(\varepsilon,\delta)$-LDP
  • Theorem 1: Privacy amplification by shuffling Feldman_FOCS21
  • Proposition 1
  • Lemma 1
  • Lemma 2
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • ...and 8 more