Table of Contents
Fetching ...

Private Frequency Estimation Via Residue Number Systems

Héber H. Arcolezi

TL;DR

This work introduces ModularSubsetSelection (MSS), a local-DP frequency-estimation mechanism that harnesses a Residue Number System to balance utility, communication, server computation, and attack resistance. By randomly selecting a modulus block and applying SubsetSelection to the corresponding residue, MSS achieves communication costs of $\lceil \log_2 \ell \rceil + \lceil \log_2 m_j \rceil$ bits per user and decodes via a variance-weighted, CRT-based least-squares solver, yielding MSE within a constant factor of state-of-the-art methods like SS and PGR. The paper provides a theoretical analysis of MSE, DRA bounds, and optimized moduli selection, along with extensive experiments showing MSS matches accuracy while reducing communication and accelerating server-side decoding, and exhibiting the lowest empirical DRA among evaluated protocols. Overall, MSS offers a practical, flexible approach for large-domain LDP frequency estimation, enabling scalable deployment with robust privacy guarantees and efficient computation.

Abstract

We present \textsf{ModularSubsetSelection} (MSS), a new algorithm for locally differentially private (LDP) frequency estimation. Given a universe of size $k$ and $n$ users, our $\varepsilon$-LDP mechanism encodes each input via a Residue Number System (RNS) over $\ell$ pairwise-coprime moduli $m_0, \ldots, m_{\ell-1}$, and reports a randomly chosen index $j \in [\ell]$ along with the perturbed residue using the statistically optimal \textsf{SubsetSelection} (SS) (Wang et al. 2016). This design reduces the user communication cost from $Θ\bigl(ω\log_2(k/ω)\bigr)$ bits required by standard SS (with $ω\approx k/(e^\varepsilon+1)$) down to $\lceil \log_2 \ell \rceil + \lceil \log_2 m_j \rceil$ bits, where $m_j < k$. Server-side decoding runs in $Θ(n + r k \ell)$ time, where $r$ is the number of LSMR (Fong and Saunders 2011) iterations. In practice, with well-conditioned moduli (\textit{i.e.}, constant $r$ and $\ell = Θ(\log k)$), this becomes $Θ(n + k \log k)$. We prove that MSS achieves worst-case MSE within a constant factor of state-of-the-art protocols such as SS and \textsf{ProjectiveGeometryResponse} (PGR) (Feldman et al. 2022) while avoiding the algebraic prerequisites and dynamic-programming decoder required by PGR. Empirically, MSS matches the estimation accuracy of SS, PGR, and \textsf{RAPPOR} (Erlingsson, Pihur, and Korolova 2014) across realistic $(k, \varepsilon)$ settings, while offering faster decoding than PGR and shorter user messages than SS. Lastly, by sampling from multiple moduli and reporting only a single perturbed residue, MSS achieves the lowest reconstruction-attack success rate among all evaluated LDP protocols.

Private Frequency Estimation Via Residue Number Systems

TL;DR

This work introduces ModularSubsetSelection (MSS), a local-DP frequency-estimation mechanism that harnesses a Residue Number System to balance utility, communication, server computation, and attack resistance. By randomly selecting a modulus block and applying SubsetSelection to the corresponding residue, MSS achieves communication costs of bits per user and decodes via a variance-weighted, CRT-based least-squares solver, yielding MSE within a constant factor of state-of-the-art methods like SS and PGR. The paper provides a theoretical analysis of MSE, DRA bounds, and optimized moduli selection, along with extensive experiments showing MSS matches accuracy while reducing communication and accelerating server-side decoding, and exhibiting the lowest empirical DRA among evaluated protocols. Overall, MSS offers a practical, flexible approach for large-domain LDP frequency estimation, enabling scalable deployment with robust privacy guarantees and efficient computation.

Abstract

We present \textsf{ModularSubsetSelection} (MSS), a new algorithm for locally differentially private (LDP) frequency estimation. Given a universe of size and users, our -LDP mechanism encodes each input via a Residue Number System (RNS) over pairwise-coprime moduli , and reports a randomly chosen index along with the perturbed residue using the statistically optimal \textsf{SubsetSelection} (SS) (Wang et al. 2016). This design reduces the user communication cost from bits required by standard SS (with ) down to bits, where . Server-side decoding runs in time, where is the number of LSMR (Fong and Saunders 2011) iterations. In practice, with well-conditioned moduli (\textit{i.e.}, constant and ), this becomes . We prove that MSS achieves worst-case MSE within a constant factor of state-of-the-art protocols such as SS and \textsf{ProjectiveGeometryResponse} (PGR) (Feldman et al. 2022) while avoiding the algebraic prerequisites and dynamic-programming decoder required by PGR. Empirically, MSS matches the estimation accuracy of SS, PGR, and \textsf{RAPPOR} (Erlingsson, Pihur, and Korolova 2014) across realistic settings, while offering faster decoding than PGR and shorter user messages than SS. Lastly, by sampling from multiple moduli and reporting only a single perturbed residue, MSS achieves the lowest reconstruction-attack success rate among all evaluated LDP protocols.

Paper Structure

This paper contains 66 sections, 5 theorems, 65 equations, 9 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

ModularSubsetSelection in Algorithm alg:mss-client satisfies $\varepsilon$-local differential privacy.

Figures (9)

  • Figure 1: MSE vs. privacy parameter $\varepsilon$ for $k = 22{,}000$ and $n = 10{,}000$, under (a) Zipf and (b) Spike distributions. MSS closely tracks the near-optimal error curves of SS and PGR.
  • Figure 2: Per-user message length (bits) of SS and MSS as a function of the $\varepsilon$, for two domain sizes. MSS consistently requires fewer bits than SS, especially in high privacy regimes.
  • Figure 3: Empirical Data Reconstruction Attack (DRA) of each protocol under the Zipf distribution, evaluated over $n = 10{,}000$ users. MSS provides the strongest protection across both small and large domains.
  • Figure 4: Comparison between analytical and empirical MSE for SS and MSS protocols, across a range of privacy budgets $\varepsilon$, for $k = 1{,}024$ and $k = 22{,}000$ under both Zipf and Spike distributions. Each empirical MSE is averaged over 300 runs, while analytical curves are computed in closed-form expressions.
  • Figure 5: Ablation study showing the impact of the number of moduli $\ell \in \{3, 6, 9, 12, 15\}$ on the utility (left), communication cost (middle), and attackability (right) of the MSS protocol. The dashed black curve represents the performance of our ModularSubsetSelection protocol (i.e., MSS[OPT]), which automatically selects $\ell$ and the moduli via our analytical optimization procedure. Results are averaged over 300 runs for $k=1{,}024$, under the Zipf distribution.
  • ...and 4 more figures

Theorems & Definitions (12)

  • Definition 1: Residue Number System (RNS) Szabó1967
  • Theorem 1: Privacy of MSS
  • proof
  • Theorem 2: Exact unbiasedness, $\lambda=0$
  • proof
  • Corollary 1: Asymptotic Unbiasedness, $\lambda > 0$
  • Remark 1
  • Theorem 3: Worst-Case MSE of MSS
  • proof : Proof Sketch
  • Theorem 4: Condition-number bound
  • ...and 2 more