Table of Contents
Fetching ...

Mutual Information Bounds in the Shuffle Model

Pengcheng Su, Haibo Cheng, Ping Wang

TL;DR

This paper provides the first information-theoretic analysis of the single-message shuffle model, addressing both the shuffle-only and shuffle-DP regimes. It derives asymptotic mutual-information bounds for the target position $K$ and the target input $X_1$ relative to the shuffled view $\boldsymbol{Z}$, using the basic configuration and blanket decomposition to handle heterogeneity. In the shuffle-only case, $I(K;\boldsymbol{Z})$ converges to $D_{KL}(P\Vert Q)$ when $P\ll Q$, and $I(Y_1;\boldsymbol{Z})$ scales as $O(1/n)$ with explicit leading terms; in the shuffle-DP setting, $I(K;\boldsymbol{Z})$ is bounded by $2\varepsilon_0$ and $I(X_1;\boldsymbol{Z}|\boldsymbol{X}_{-1})$ by $(e^{\varepsilon_0}-1)/(2n)$ up to $O(n^{-3/2})$, linking shuffle amplification to MI privacy. The authors extend the blanket decomposition from DP to the MI framework, providing a unified method to quantify leakage under both settings and across heterogeneous user distributions. This work advances theoretical understanding of privacy-utility tradeoffs in shuffled data collection and offers a foundation for non-asymptotic analyses and extensions to broader shuffle-based protocols.

Abstract

The shuffle model enhances privacy by anonymizing users' reports through random permutation. This paper presents the first systematic study of the single-message shuffle model from an information-theoretic perspective. We analyze two regimes: the shuffle-only setting, where each user directly submits its message ($Y_i=X_i$), and the shuffle-DP setting, where each user first applies a local $\varepsilon_0$-differentially private mechanism before shuffling ($Y_i=\mathcal{R}(X_i)$). Let $\boldsymbol{Z} = (Y_{σ(i)})_i$ denote the shuffled sequence produced by a uniformly random permutation $σ$, and let $K = σ^{-1}(1)$ represent the position of user 1's message after shuffling. For the shuffle-only setting, we focus on a tractable yet expressive \emph{basic configuration}, where the target user's message follows $Y_1 \sim P$ and the remaining users' messages are i.i.d.\ samples from $Q$, i.e., $Y_2,\dots,Y_n \sim Q$. We derive asymptotic expressions for the mutual information quantities $I(Y_1;\boldsymbol{Z})$ and $I(K;\boldsymbol{Z})$ as $n \to \infty$, and demonstrate how this analytical framework naturally extends to settings with heterogeneous user distributions. For the shuffle-DP setting, we establish information-theoretic upper bounds on total information leakage. When each user applies an $\varepsilon_0$-DP mechanism, the overall leakage satisfies $I(K; \boldsymbol{Z}) \le 2\varepsilon_0$ and $I(X_1; \boldsymbol{Z}\mid (X_i)_{i=2}^n) \le (e^{\varepsilon_0}-1)/(2n) + O(n^{-3/2})$. These results bridge shuffle differential privacy and mutual-information-based privacy.

Mutual Information Bounds in the Shuffle Model

TL;DR

This paper provides the first information-theoretic analysis of the single-message shuffle model, addressing both the shuffle-only and shuffle-DP regimes. It derives asymptotic mutual-information bounds for the target position and the target input relative to the shuffled view , using the basic configuration and blanket decomposition to handle heterogeneity. In the shuffle-only case, converges to when , and scales as with explicit leading terms; in the shuffle-DP setting, is bounded by and by up to , linking shuffle amplification to MI privacy. The authors extend the blanket decomposition from DP to the MI framework, providing a unified method to quantify leakage under both settings and across heterogeneous user distributions. This work advances theoretical understanding of privacy-utility tradeoffs in shuffled data collection and offers a foundation for non-asymptotic analyses and extensions to broader shuffle-based protocols.

Abstract

The shuffle model enhances privacy by anonymizing users' reports through random permutation. This paper presents the first systematic study of the single-message shuffle model from an information-theoretic perspective. We analyze two regimes: the shuffle-only setting, where each user directly submits its message (), and the shuffle-DP setting, where each user first applies a local -differentially private mechanism before shuffling (). Let denote the shuffled sequence produced by a uniformly random permutation , and let represent the position of user 1's message after shuffling. For the shuffle-only setting, we focus on a tractable yet expressive \emph{basic configuration}, where the target user's message follows and the remaining users' messages are i.i.d.\ samples from , i.e., . We derive asymptotic expressions for the mutual information quantities and as , and demonstrate how this analytical framework naturally extends to settings with heterogeneous user distributions. For the shuffle-DP setting, we establish information-theoretic upper bounds on total information leakage. When each user applies an -DP mechanism, the overall leakage satisfies and . These results bridge shuffle differential privacy and mutual-information-based privacy.

Paper Structure

This paper contains 21 sections, 23 theorems, 202 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

In the shuffle-only setting with basic configuration $(P,Q)$, conditioning on the shuffled output $\boldsymbol{Z}=\boldsymbol{z}=(z_1,z_2,\dots,z_n)$, the posterior distributions of $K$ and $Y_1$ are given by

Figures (4)

  • Figure 1: Illustration of the single-message shuffle model. Each user $i$ generates a local message $Y_i$, either deterministically ($Y_i=\mathrm{Id}(X_i)$) or via a local randomizer ($Y_i=\mathcal{R}(X_i)$). The shuffler then applies a random permutation $\sigma$, uniformly sampled from the permutation group $\mathcal{S}_n$, to produce the anonymized outputs $Z_i = Y_{\sigma(i)}$.
  • Figure 2: Exact vs. asymptotic mutual information in the basic shufle-only setting with $P = Q$.
  • Figure 3: Exact vs. asymptotic mutual information in the basic shuffle-only setting with $P \ll Q$, and verification of the optimal $Q$.
  • Figure 4: Mutual information in the shuffle-DP setting: numerical estimates vs. asymptotic bounds.

Theorems & Definitions (57)

  • Definition 1: Differential Privacy
  • Definition 2: Local Differential Privacy
  • Definition 3: Differential Privacy in the Shuffle Model
  • Remark 1
  • Theorem 1
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Example 3.1
  • ...and 47 more