Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response

Sayan Biswas; Kangsoo Jung; Catuscia Palamidessi

Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response

Sayan Biswas, Kangsoo Jung, Catuscia Palamidessi

TL;DR

This work derives the tight $(\epsilon,\delta)$-adaptive differential privacy bound for the shuffle model when using a $k$-Randomized Response local randomizer, advancing beyond existing bounds by leveraging Sommer et al.'s ADP framework. It provides a closed-form expression for the DP leakage $\delta(\epsilon)$ via a detailed analysis of histogram queries and the associated privacy-loss distribution, showing that the shuffle model can offer stronger privacy guarantees than previously known for the same local noise level. The authors then compare the utility of the shuffle model against the central Gaussian mechanism under the same privacy parameters, using matrix inversion post-processing to de-noise shuffled histograms and assess total variation distance to the true distribution. Across synthetic and Gowalla real-data experiments, the central model typically yields slightly better utility, but the discrepancy is small, particularly for smaller sample sizes or with post-processing, indicating that shuffle models can achieve near-central utility with strong privacy guarantees. This work thus strengthens the theoretical and empirical case for the shuffle model as a practical DP framework with favorable privacy-utility trade-offs for histogram releases.

Abstract

Most differentially private (DP) algorithms assume a central model in which a reliable third party inserts noise to queries made on datasets, or a local model where the users locally perturb their data. However, the central model is vulnerable via a single point of failure, and in the local model, the utility of the data deteriorates significantly. The recently proposed shuffle model is an intermediate framework between the central and the local paradigms where the users send their locally privatized data to a server where messages are shuffled, effacing the link between a privatized message and the corresponding user, giving a better trade-off between privacy and utility than the local model, as its privacy gets amplified without adding more noise. In this paper, we theoretically derive the strictest known bound for DP guarantee for the shuffle models with $k$-Randomized Response local randomizers. There on, we focus on the utility of the shuffle model for histogram queries. Leveraging on the matrix inversion method, which is used to approximate the original distribution from the empirical one produced by the $k$-RR mechanism, we de-noise the histogram produced by the shuffle model to evaluate the total variation distance of the resulting histogram from the true one, which we regard as the measure of utility of the privacy mechanism. We perform experiments on both synthetic and real data to compare the privacy-utility trade-off of the shuffle model with that of the central one privatized by adding the state-of-the-art Gaussian noise to each bin. Although the experimental results stay consistent with the literature that favour the central model, we see that, the difference in statistical utilities between the central and the shuffle models is very small, showing that they are almost comparable under the same level of DP.

Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response

TL;DR

This work derives the tight

-adaptive differential privacy bound for the shuffle model when using a

-Randomized Response local randomizer, advancing beyond existing bounds by leveraging Sommer et al.'s ADP framework. It provides a closed-form expression for the DP leakage

via a detailed analysis of histogram queries and the associated privacy-loss distribution, showing that the shuffle model can offer stronger privacy guarantees than previously known for the same local noise level. The authors then compare the utility of the shuffle model against the central Gaussian mechanism under the same privacy parameters, using matrix inversion post-processing to de-noise shuffled histograms and assess total variation distance to the true distribution. Across synthetic and Gowalla real-data experiments, the central model typically yields slightly better utility, but the discrepancy is small, particularly for smaller sample sizes or with post-processing, indicating that shuffle models can achieve near-central utility with strong privacy guarantees. This work thus strengthens the theoretical and empirical case for the shuffle model as a practical DP framework with favorable privacy-utility trade-offs for histogram releases.

Abstract

-Randomized Response local randomizers. There on, we focus on the utility of the shuffle model for histogram queries. Leveraging on the matrix inversion method, which is used to approximate the original distribution from the empirical one produced by the

-RR mechanism, we de-noise the histogram produced by the shuffle model to evaluate the total variation distance of the resulting histogram from the true one, which we regard as the measure of utility of the privacy mechanism. We perform experiments on both synthetic and real data to compare the privacy-utility trade-off of the shuffle model with that of the central one privatized by adding the state-of-the-art Gaussian noise to each bin. Although the experimental results stay consistent with the literature that favour the central model, we see that, the difference in statistical utilities between the central and the shuffle models is very small, showing that they are almost comparable under the same level of DP.

Paper Structure (16 sections, 2 theorems, 17 equations, 3 figures, 5 tables)

This paper contains 16 sections, 2 theorems, 17 equations, 3 figures, 5 tables.

Introduction
Related work
Preliminaries
Tight privacy guarantee for SM
Overview
Framework
Theorems and results
Evaluating the utility of the shuffle model
Experimental results on synthetic data
Tight $\delta$ for histogram queries
Comparing the utility of the shuffle and the central models
Experimental results on real data
Conclusion
Acknowledgment
Proof of Theorem Theorem \ref{['th:tightdelta_ADP']}
...and 1 more sections

Key Result

theorem thmcountertheorem

For any $\epsilon>0$, we get the tight $(\epsilon,\,\delta)$-ADP guarantee for $\mathcal{M}$ with respect to $x_0,\,x_1$ iff $\delta=\hat{\delta}(\epsilon)$ as in as in eq:tightdelta of Definition finaldelta where

Figures (3)

Figure 1: (a) and (b): Location data from Gowalla check-ins from a northern part of San Francisco and a part of Paris. (c) and (d) give the heatmap of the locations in the areas of San Francisco and Paris as an alternative visualization.
Figure 2: Estimation of the original distribution from the noisy data obfuscated with the Gaussian mechanism and the SM in San Francisco and Paris dataset
Figure 3: Illustrating the comparison of community level utilities between Gaussian, shuffle and shuffle+INV for varying $n$ and $\epsilon$ in San Francisco and Paris dataset

Theorems & Definitions (15)

definition thmcounterdefinition: Differential privacyDworkDifferentialPrivacy
definition thmcounterdefinition: Adaptive differential privacy sommer2019privacy
remark thmcounterremark
definition thmcounterdefinition: Tight DP (or ADP) sommer2019privacy
definition thmcounterdefinition: Local differential privacyduchi2013local
definition thmcounterdefinition: k-Randomized Responsekairouz2016discrete
definition thmcounterdefinition: Shuffle modelerlingsson2019amplification
definition thmcounterdefinition: Histogram query balcer2019separating
definition thmcounterdefinition: Privacy loss random variable sommer2019privacy
definition thmcounterdefinition: Privacy loss distribution sommer2019privacy
...and 5 more

Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response

TL;DR

Abstract

Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (15)