Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response
Sayan Biswas, Kangsoo Jung, Catuscia Palamidessi
TL;DR
This work derives the tight $(\epsilon,\delta)$-adaptive differential privacy bound for the shuffle model when using a $k$-Randomized Response local randomizer, advancing beyond existing bounds by leveraging Sommer et al.'s ADP framework. It provides a closed-form expression for the DP leakage $\delta(\epsilon)$ via a detailed analysis of histogram queries and the associated privacy-loss distribution, showing that the shuffle model can offer stronger privacy guarantees than previously known for the same local noise level. The authors then compare the utility of the shuffle model against the central Gaussian mechanism under the same privacy parameters, using matrix inversion post-processing to de-noise shuffled histograms and assess total variation distance to the true distribution. Across synthetic and Gowalla real-data experiments, the central model typically yields slightly better utility, but the discrepancy is small, particularly for smaller sample sizes or with post-processing, indicating that shuffle models can achieve near-central utility with strong privacy guarantees. This work thus strengthens the theoretical and empirical case for the shuffle model as a practical DP framework with favorable privacy-utility trade-offs for histogram releases.
Abstract
Most differentially private (DP) algorithms assume a central model in which a reliable third party inserts noise to queries made on datasets, or a local model where the users locally perturb their data. However, the central model is vulnerable via a single point of failure, and in the local model, the utility of the data deteriorates significantly. The recently proposed shuffle model is an intermediate framework between the central and the local paradigms where the users send their locally privatized data to a server where messages are shuffled, effacing the link between a privatized message and the corresponding user, giving a better trade-off between privacy and utility than the local model, as its privacy gets amplified without adding more noise. In this paper, we theoretically derive the strictest known bound for DP guarantee for the shuffle models with $k$-Randomized Response local randomizers. There on, we focus on the utility of the shuffle model for histogram queries. Leveraging on the matrix inversion method, which is used to approximate the original distribution from the empirical one produced by the $k$-RR mechanism, we de-noise the histogram produced by the shuffle model to evaluate the total variation distance of the resulting histogram from the true one, which we regard as the measure of utility of the privacy mechanism. We perform experiments on both synthetic and real data to compare the privacy-utility trade-off of the shuffle model with that of the central one privatized by adding the state-of-the-art Gaussian noise to each bin. Although the experimental results stay consistent with the literature that favour the central model, we see that, the difference in statistical utilities between the central and the shuffle models is very small, showing that they are almost comparable under the same level of DP.
