Table of Contents
Fetching ...

Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response

Sayan Biswas, Kangsoo Jung, Catuscia Palamidessi

TL;DR

This work derives the tight $(\epsilon,\delta)$-adaptive differential privacy bound for the shuffle model when using a $k$-Randomized Response local randomizer, advancing beyond existing bounds by leveraging Sommer et al.'s ADP framework. It provides a closed-form expression for the DP leakage $\delta(\epsilon)$ via a detailed analysis of histogram queries and the associated privacy-loss distribution, showing that the shuffle model can offer stronger privacy guarantees than previously known for the same local noise level. The authors then compare the utility of the shuffle model against the central Gaussian mechanism under the same privacy parameters, using matrix inversion post-processing to de-noise shuffled histograms and assess total variation distance to the true distribution. Across synthetic and Gowalla real-data experiments, the central model typically yields slightly better utility, but the discrepancy is small, particularly for smaller sample sizes or with post-processing, indicating that shuffle models can achieve near-central utility with strong privacy guarantees. This work thus strengthens the theoretical and empirical case for the shuffle model as a practical DP framework with favorable privacy-utility trade-offs for histogram releases.

Abstract

Most differentially private (DP) algorithms assume a central model in which a reliable third party inserts noise to queries made on datasets, or a local model where the users locally perturb their data. However, the central model is vulnerable via a single point of failure, and in the local model, the utility of the data deteriorates significantly. The recently proposed shuffle model is an intermediate framework between the central and the local paradigms where the users send their locally privatized data to a server where messages are shuffled, effacing the link between a privatized message and the corresponding user, giving a better trade-off between privacy and utility than the local model, as its privacy gets amplified without adding more noise. In this paper, we theoretically derive the strictest known bound for DP guarantee for the shuffle models with $k$-Randomized Response local randomizers. There on, we focus on the utility of the shuffle model for histogram queries. Leveraging on the matrix inversion method, which is used to approximate the original distribution from the empirical one produced by the $k$-RR mechanism, we de-noise the histogram produced by the shuffle model to evaluate the total variation distance of the resulting histogram from the true one, which we regard as the measure of utility of the privacy mechanism. We perform experiments on both synthetic and real data to compare the privacy-utility trade-off of the shuffle model with that of the central one privatized by adding the state-of-the-art Gaussian noise to each bin. Although the experimental results stay consistent with the literature that favour the central model, we see that, the difference in statistical utilities between the central and the shuffle models is very small, showing that they are almost comparable under the same level of DP.

Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response

TL;DR

This work derives the tight -adaptive differential privacy bound for the shuffle model when using a -Randomized Response local randomizer, advancing beyond existing bounds by leveraging Sommer et al.'s ADP framework. It provides a closed-form expression for the DP leakage via a detailed analysis of histogram queries and the associated privacy-loss distribution, showing that the shuffle model can offer stronger privacy guarantees than previously known for the same local noise level. The authors then compare the utility of the shuffle model against the central Gaussian mechanism under the same privacy parameters, using matrix inversion post-processing to de-noise shuffled histograms and assess total variation distance to the true distribution. Across synthetic and Gowalla real-data experiments, the central model typically yields slightly better utility, but the discrepancy is small, particularly for smaller sample sizes or with post-processing, indicating that shuffle models can achieve near-central utility with strong privacy guarantees. This work thus strengthens the theoretical and empirical case for the shuffle model as a practical DP framework with favorable privacy-utility trade-offs for histogram releases.

Abstract

Most differentially private (DP) algorithms assume a central model in which a reliable third party inserts noise to queries made on datasets, or a local model where the users locally perturb their data. However, the central model is vulnerable via a single point of failure, and in the local model, the utility of the data deteriorates significantly. The recently proposed shuffle model is an intermediate framework between the central and the local paradigms where the users send their locally privatized data to a server where messages are shuffled, effacing the link between a privatized message and the corresponding user, giving a better trade-off between privacy and utility than the local model, as its privacy gets amplified without adding more noise. In this paper, we theoretically derive the strictest known bound for DP guarantee for the shuffle models with -Randomized Response local randomizers. There on, we focus on the utility of the shuffle model for histogram queries. Leveraging on the matrix inversion method, which is used to approximate the original distribution from the empirical one produced by the -RR mechanism, we de-noise the histogram produced by the shuffle model to evaluate the total variation distance of the resulting histogram from the true one, which we regard as the measure of utility of the privacy mechanism. We perform experiments on both synthetic and real data to compare the privacy-utility trade-off of the shuffle model with that of the central one privatized by adding the state-of-the-art Gaussian noise to each bin. Although the experimental results stay consistent with the literature that favour the central model, we see that, the difference in statistical utilities between the central and the shuffle models is very small, showing that they are almost comparable under the same level of DP.
Paper Structure (16 sections, 2 theorems, 17 equations, 3 figures, 5 tables)

This paper contains 16 sections, 2 theorems, 17 equations, 3 figures, 5 tables.

Key Result

theorem thmcountertheorem

For any $\epsilon>0$, we get the tight $(\epsilon,\,\delta)$-ADP guarantee for $\mathcal{M}$ with respect to $x_0,\,x_1$ iff $\delta=\hat{\delta}(\epsilon)$ as in as in eq:tightdelta of Definition finaldelta where

Figures (3)

  • Figure 1: (a) and (b): Location data from Gowalla check-ins from a northern part of San Francisco and a part of Paris. (c) and (d) give the heatmap of the locations in the areas of San Francisco and Paris as an alternative visualization.
  • Figure 2: Estimation of the original distribution from the noisy data obfuscated with the Gaussian mechanism and the SM in San Francisco and Paris dataset
  • Figure 3: Illustrating the comparison of community level utilities between Gaussian, shuffle and shuffle+INV for varying $n$ and $\epsilon$ in San Francisco and Paris dataset

Theorems & Definitions (15)

  • definition thmcounterdefinition: Differential privacyDworkDifferentialPrivacy
  • definition thmcounterdefinition: Adaptive differential privacy sommer2019privacy
  • remark thmcounterremark
  • definition thmcounterdefinition: Tight DP (or ADP) sommer2019privacy
  • definition thmcounterdefinition: Local differential privacyduchi2013local
  • definition thmcounterdefinition: k-Randomized Responsekairouz2016discrete
  • definition thmcounterdefinition: Shuffle modelerlingsson2019amplification
  • definition thmcounterdefinition: Histogram query balcer2019separating
  • definition thmcounterdefinition: Privacy loss random variable sommer2019privacy
  • definition thmcounterdefinition: Privacy loss distribution sommer2019privacy
  • ...and 5 more