Table of Contents
Fetching ...

Bayesian Advantage of Re-Identification Attack in the Shuffle Model

Pengcheng Su, Haibo Cheng, Ping Wang

TL;DR

The paper analyzes re-identification risk in the shuffle model by formalizing the Bayesian success probability $β_n(P,Q)$ for identifying a sample drawn from $P$ among $n-1$ samples drawn from $Q$, and by defining additive and multiplicative Bayesian advantages. It provides an exact expression for $β_n(P,Q)$ using likelihood-ratio distributions and characterizes its asymptotics, including a tight relation to the total variation distance $Δ(P,Q)$. The authors extend the analysis to shuffle differential privacy, showing that for an $ε$-DP local randomizer the re-identification probability is bounded by $β_n(\mathcal{R}) \le \frac{e^{ε}}{n}$, and they develop a decomposition-based framework (clone and blanket) to obtain tight bounds, with the blanket approach shown to be optimal among decompositions. The results offer a principled view of anonymity leakage in shuffle-based systems and furnish quantitative guidance for honeyword-style defenses and privacy amplification in Shuffle DP, bridging information-theoretic attack analysis with practical privacy guarantees.

Abstract

The shuffle model, which anonymizes data by randomly permuting user messages, has been widely adopted in both cryptography and differential privacy. In this work, we present the first systematic study of the Bayesian advantage in re-identifying a user's message under the shuffle model. We begin with a basic setting: one sample is drawn from a distribution $P$, and $n - 1$ samples are drawn from a distribution $Q$, after which all $n$ samples are randomly shuffled. We define $β_n(P, Q)$ as the success probability of a Bayes-optimal adversary in identifying the sample from $P$, and define the additive and multiplicative Bayesian advantages as $\mathsf{Adv}_n^{+}(P, Q) = β_n(P,Q) - \frac{1}{n}$ and $\mathsf{Adv}_n^{\times}(P, Q) = n \cdot β_n(P,Q)$, respectively. We derive exact analytical expressions and asymptotic characterizations of $β_n(P, Q)$, along with evaluations in several representative scenarios. Furthermore, we establish (nearly) tight mutual bounds between the additive Bayesian advantage and the total variation distance. Finally, we extend our analysis beyond the basic setting and present, for the first time, an upper bound on the success probability of Bayesian attacks in shuffle differential privacy. Specifically, when the outputs of $n$ users -- each processed through an $\varepsilon$-differentially private local randomizer -- are shuffled, the probability that an attacker successfully re-identifies any target user's message is at most $e^{\varepsilon}/n$.

Bayesian Advantage of Re-Identification Attack in the Shuffle Model

TL;DR

The paper analyzes re-identification risk in the shuffle model by formalizing the Bayesian success probability for identifying a sample drawn from among samples drawn from , and by defining additive and multiplicative Bayesian advantages. It provides an exact expression for using likelihood-ratio distributions and characterizes its asymptotics, including a tight relation to the total variation distance . The authors extend the analysis to shuffle differential privacy, showing that for an -DP local randomizer the re-identification probability is bounded by , and they develop a decomposition-based framework (clone and blanket) to obtain tight bounds, with the blanket approach shown to be optimal among decompositions. The results offer a principled view of anonymity leakage in shuffle-based systems and furnish quantitative guidance for honeyword-style defenses and privacy amplification in Shuffle DP, bridging information-theoretic attack analysis with practical privacy guarantees.

Abstract

The shuffle model, which anonymizes data by randomly permuting user messages, has been widely adopted in both cryptography and differential privacy. In this work, we present the first systematic study of the Bayesian advantage in re-identifying a user's message under the shuffle model. We begin with a basic setting: one sample is drawn from a distribution , and samples are drawn from a distribution , after which all samples are randomly shuffled. We define as the success probability of a Bayes-optimal adversary in identifying the sample from , and define the additive and multiplicative Bayesian advantages as and , respectively. We derive exact analytical expressions and asymptotic characterizations of , along with evaluations in several representative scenarios. Furthermore, we establish (nearly) tight mutual bounds between the additive Bayesian advantage and the total variation distance. Finally, we extend our analysis beyond the basic setting and present, for the first time, an upper bound on the success probability of Bayesian attacks in shuffle differential privacy. Specifically, when the outputs of users -- each processed through an -differentially private local randomizer -- are shuffled, the probability that an attacker successfully re-identifies any target user's message is at most .

Paper Structure

This paper contains 23 sections, 13 theorems, 108 equations, 5 figures, 5 algorithms.

Key Result

Corollary 1

If $P = Q$, then for any $n \ge 1$, the success probability of any adversary $\mathcal{A}$ in the basic setting is exactly $1/n$. That is,

Figures (5)

  • Figure 1: Re-identification attack in the shuffle model. Each user $i$ generates $y_i \sim P_i$, and the shuffler randomly permutes the outputs. The adversary observes $\boldsymbol{z} = (y_{\sigma(1)}, \dots, y_{\sigma(n)})$ and tries to infer the position of a target message.
  • Figure 2: $\beta_n^k(P,Q)$ as a function of $k$ for $n=20$ in Example \ref{['exmp1']}
  • Figure 3: $\beta_n(P,Q)$ vs. $n$ where $P=Zipf(0.7)$ and $Q$ is uniform
  • Figure 4: $\beta_{20}^k(P,Q)$ vs. $k$ where $P=Zipf(0.7)$ and $Q$ is uniform
  • Figure 5: Decomposition methods for $1$-DP Laplace mechanism Dwork2006 on $\{0,1\}$

Theorems & Definitions (40)

  • Definition 1: Differential Privacy
  • Definition 2: Local Differential Privacy
  • Definition 3: Differential Privacy in the Shuffle Model
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Example 3.1
  • Definition 8
  • Corollary 1
  • ...and 30 more