Privacy Amplification via Shuffling: Unified, Simplified, and Tightened
Shaowei Wang, Yun Peng, Jin Li, Zikai Wen, Zhipeng Li, Shiyu Yu, Di Wang, Wei Yang
TL;DR
This work addresses the challenge of tight, general privacy amplification bounds in the shuffle model of differential privacy. It introduces the variation-ratio reduction framework that ties shuffle amplification to two parameters, the pairwise total-variation bound β and the probability-ratio bound p/q, using mixture decompositions and a binomial-dominance approach to bound the hockey-stick divergence. The authors derive both upper and lower bounds, include a fast ∼Ō(n) numerical method for computing bounds, and extend the framework to parallel composition, achieving substantially tighter privacy guarantees for single- and multi-message protocols. Empirical results show meaningful budget savings (up to 30% for single-message, 70–95% for multi-message and advanced parallel composition) and demonstrate the method’s scalability to very large populations. Overall, the variation-ratio framework provides a unified, tight, and efficient toolkit for analyzing shuffle-based DP across a wide range of mechanisms and settings, with practical implications for real-world private data analysis.
Abstract
The shuffle model of differential privacy provides promising privacy-utility balances in decentralized, privacy-preserving data analysis. However, the current analyses of privacy amplification via shuffling lack both tightness and generality. To address this issue, we propose the \emph{variation-ratio reduction} as a comprehensive framework for privacy amplification in both single-message and multi-message shuffle protocols. It leverages two new parameterizations: the total variation bounds of local messages and the probability ratio bounds of blanket messages, to determine indistinguishability levels. Our theoretical results demonstrate that our framework provides tighter bounds, especially for local randomizers with extremal probability design, where our bounds are exactly tight. Additionally, variation-ratio reduction complements parallel composition in the shuffle model, yielding enhanced privacy accounting for popular sampling-based randomizers employed in statistical queries (e.g., range queries, marginal queries, and frequent itemset mining). Empirical findings demonstrate that our numerical amplification bounds surpass existing ones, conserving up to $30\%$ of the budget for single-message protocols, $75\%$ for multi-message ones, and a striking $75\%$-$95\%$ for parallel composition. Our bounds also result in a remarkably efficient $\tilde{O}(n)$ algorithm that numerically amplifies privacy in less than $10$ seconds for $n=10^8$ users.
