Mutual Information Bounds in the Shuffle Model
Pengcheng Su, Haibo Cheng, Ping Wang
TL;DR
This paper provides the first information-theoretic analysis of the single-message shuffle model, addressing both the shuffle-only and shuffle-DP regimes. It derives asymptotic mutual-information bounds for the target position $K$ and the target input $X_1$ relative to the shuffled view $\boldsymbol{Z}$, using the basic configuration and blanket decomposition to handle heterogeneity. In the shuffle-only case, $I(K;\boldsymbol{Z})$ converges to $D_{KL}(P\Vert Q)$ when $P\ll Q$, and $I(Y_1;\boldsymbol{Z})$ scales as $O(1/n)$ with explicit leading terms; in the shuffle-DP setting, $I(K;\boldsymbol{Z})$ is bounded by $2\varepsilon_0$ and $I(X_1;\boldsymbol{Z}|\boldsymbol{X}_{-1})$ by $(e^{\varepsilon_0}-1)/(2n)$ up to $O(n^{-3/2})$, linking shuffle amplification to MI privacy. The authors extend the blanket decomposition from DP to the MI framework, providing a unified method to quantify leakage under both settings and across heterogeneous user distributions. This work advances theoretical understanding of privacy-utility tradeoffs in shuffled data collection and offers a foundation for non-asymptotic analyses and extensions to broader shuffle-based protocols.
Abstract
The shuffle model enhances privacy by anonymizing users' reports through random permutation. This paper presents the first systematic study of the single-message shuffle model from an information-theoretic perspective. We analyze two regimes: the shuffle-only setting, where each user directly submits its message ($Y_i=X_i$), and the shuffle-DP setting, where each user first applies a local $\varepsilon_0$-differentially private mechanism before shuffling ($Y_i=\mathcal{R}(X_i)$). Let $\boldsymbol{Z} = (Y_{σ(i)})_i$ denote the shuffled sequence produced by a uniformly random permutation $σ$, and let $K = σ^{-1}(1)$ represent the position of user 1's message after shuffling. For the shuffle-only setting, we focus on a tractable yet expressive \emph{basic configuration}, where the target user's message follows $Y_1 \sim P$ and the remaining users' messages are i.i.d.\ samples from $Q$, i.e., $Y_2,\dots,Y_n \sim Q$. We derive asymptotic expressions for the mutual information quantities $I(Y_1;\boldsymbol{Z})$ and $I(K;\boldsymbol{Z})$ as $n \to \infty$, and demonstrate how this analytical framework naturally extends to settings with heterogeneous user distributions. For the shuffle-DP setting, we establish information-theoretic upper bounds on total information leakage. When each user applies an $\varepsilon_0$-DP mechanism, the overall leakage satisfies $I(K; \boldsymbol{Z}) \le 2\varepsilon_0$ and $I(X_1; \boldsymbol{Z}\mid (X_i)_{i=2}^n) \le (e^{\varepsilon_0}-1)/(2n) + O(n^{-3/2})$. These results bridge shuffle differential privacy and mutual-information-based privacy.
