Table of Contents
Fetching ...

Privacy Amplification via Shuffling: Unified, Simplified, and Tightened

Shaowei Wang, Yun Peng, Jin Li, Zikai Wen, Zhipeng Li, Shiyu Yu, Di Wang, Wei Yang

TL;DR

This work addresses the challenge of tight, general privacy amplification bounds in the shuffle model of differential privacy. It introduces the variation-ratio reduction framework that ties shuffle amplification to two parameters, the pairwise total-variation bound β and the probability-ratio bound p/q, using mixture decompositions and a binomial-dominance approach to bound the hockey-stick divergence. The authors derive both upper and lower bounds, include a fast ∼Ō(n) numerical method for computing bounds, and extend the framework to parallel composition, achieving substantially tighter privacy guarantees for single- and multi-message protocols. Empirical results show meaningful budget savings (up to 30% for single-message, 70–95% for multi-message and advanced parallel composition) and demonstrate the method’s scalability to very large populations. Overall, the variation-ratio framework provides a unified, tight, and efficient toolkit for analyzing shuffle-based DP across a wide range of mechanisms and settings, with practical implications for real-world private data analysis.

Abstract

The shuffle model of differential privacy provides promising privacy-utility balances in decentralized, privacy-preserving data analysis. However, the current analyses of privacy amplification via shuffling lack both tightness and generality. To address this issue, we propose the \emph{variation-ratio reduction} as a comprehensive framework for privacy amplification in both single-message and multi-message shuffle protocols. It leverages two new parameterizations: the total variation bounds of local messages and the probability ratio bounds of blanket messages, to determine indistinguishability levels. Our theoretical results demonstrate that our framework provides tighter bounds, especially for local randomizers with extremal probability design, where our bounds are exactly tight. Additionally, variation-ratio reduction complements parallel composition in the shuffle model, yielding enhanced privacy accounting for popular sampling-based randomizers employed in statistical queries (e.g., range queries, marginal queries, and frequent itemset mining). Empirical findings demonstrate that our numerical amplification bounds surpass existing ones, conserving up to $30\%$ of the budget for single-message protocols, $75\%$ for multi-message ones, and a striking $75\%$-$95\%$ for parallel composition. Our bounds also result in a remarkably efficient $\tilde{O}(n)$ algorithm that numerically amplifies privacy in less than $10$ seconds for $n=10^8$ users.

Privacy Amplification via Shuffling: Unified, Simplified, and Tightened

TL;DR

This work addresses the challenge of tight, general privacy amplification bounds in the shuffle model of differential privacy. It introduces the variation-ratio reduction framework that ties shuffle amplification to two parameters, the pairwise total-variation bound β and the probability-ratio bound p/q, using mixture decompositions and a binomial-dominance approach to bound the hockey-stick divergence. The authors derive both upper and lower bounds, include a fast ∼Ō(n) numerical method for computing bounds, and extend the framework to parallel composition, achieving substantially tighter privacy guarantees for single- and multi-message protocols. Empirical results show meaningful budget savings (up to 30% for single-message, 70–95% for multi-message and advanced parallel composition) and demonstrate the method’s scalability to very large populations. Overall, the variation-ratio framework provides a unified, tight, and efficient toolkit for analyzing shuffle-based DP across a wide range of mechanisms and settings, with practical implications for real-world private data analysis.

Abstract

The shuffle model of differential privacy provides promising privacy-utility balances in decentralized, privacy-preserving data analysis. However, the current analyses of privacy amplification via shuffling lack both tightness and generality. To address this issue, we propose the \emph{variation-ratio reduction} as a comprehensive framework for privacy amplification in both single-message and multi-message shuffle protocols. It leverages two new parameterizations: the total variation bounds of local messages and the probability ratio bounds of blanket messages, to determine indistinguishability levels. Our theoretical results demonstrate that our framework provides tighter bounds, especially for local randomizers with extremal probability design, where our bounds are exactly tight. Additionally, variation-ratio reduction complements parallel composition in the shuffle model, yielding enhanced privacy accounting for popular sampling-based randomizers employed in statistical queries (e.g., range queries, marginal queries, and frequent itemset mining). Empirical findings demonstrate that our numerical amplification bounds surpass existing ones, conserving up to of the budget for single-message protocols, for multi-message ones, and a striking - for parallel composition. Our bounds also result in a remarkably efficient algorithm that numerically amplifies privacy in less than seconds for users.
Paper Structure (30 sections, 11 theorems, 54 equations, 5 figures, 6 tables, 3 algorithms)

This paper contains 30 sections, 11 theorems, 54 equations, 5 figures, 6 tables, 3 algorithms.

Key Result

theorem 1

For $p> 1, \beta\in [0, \frac{p-1}{p+1}],\\q\geq 1$, if randomizers $\{\mathcal{R}_i\}_{i\in [n]}$ satisfy the $(p, \beta)$-variation property and the $q$-ratio property, then for any $x_1^0,x_1^1,x_2,...,x_n\in \mathbb{X}$: where $\alpha=\frac{\beta}{p-1}$, $r=\frac{\alpha p}{q}$, $low_c=\frac{(e^{\epsilon}p-1)\alpha c+(e^{\epsilon}-1)(1-\alpha-\alpha p)\cdot \frac{(n-c)r}{1-2r}}{\alpha(e^{\epsi

Figures (5)

  • Figure 1: Numerical comparison of amplification effects (base $2$ logarithm of amplification ratio) of subset selection mechanism with $n=10^4$ or $10^5$, domain size $d=16$ or $128$, and varying local budget $\epsilon_0\in [0.1, 5.0]$.
  • Figure 2: Numerical comparison of amplification effects (base $2$ logarithm of amplification ratio) of optimal local hash mechanism with $n=10^4$ or $10^5$, domain size $d=16$ or $128$, and local budget $\epsilon_0\in [0.1, 5.0]$.
  • Figure 3: Numerical comparison of amplification effects (base $2$ logarithm of extra amplification ratio) of the Cheu et al.cheu2022differentially multi-message protocol with $n=10^4$ or $10^5$, domain size $d=16$ or $128$, and varying global budget $\epsilon'\in [0.01, 1.5]$.
  • Figure 4: Numerical comparison of amplification effects (base $2$ logarithm of extra amplification ratio) of the balls-into-bins multi-message protocol with $n=\frac{32\log(2/\delta)d}{\epsilon'^2 s}$luo2022frequency and varying global budget $\epsilon'\in [0.01, 1.5]$.
  • Figure 5: Numerical comparison of amplification effects (base $2$ logarithm of amplification ratio) of separated approach, basic parallel composition, and advanced parallel composition.

Theorems & Definitions (17)

  • definition 1: Hockey-stick divergence
  • definition 2: Differential privacy dwork2006calibrating
  • definition 3: Local differential privacy kasiviswanathan2011can
  • definition 4: Metric $(d_{\mathbb{X}},\delta)$-differential privacy chatzikokolakis2013broadening
  • definition 5: Local metric $d_{\mathbb{X}}$-differential privacy andres2013geoalvim2018local
  • definition 6: Differential privacy in the shuffle model
  • theorem 1: Divergence upper bound
  • theorem 2: Analytic privacy amplification bounds
  • theorem 3: Asymptotic privacy amplification bounds
  • lemma 1: Mixture decompositions
  • ...and 7 more