Private Frequency Estimation Via Residue Number Systems
Héber H. Arcolezi
TL;DR
This work introduces ModularSubsetSelection (MSS), a local-DP frequency-estimation mechanism that harnesses a Residue Number System to balance utility, communication, server computation, and attack resistance. By randomly selecting a modulus block and applying SubsetSelection to the corresponding residue, MSS achieves communication costs of $\lceil \log_2 \ell \rceil + \lceil \log_2 m_j \rceil$ bits per user and decodes via a variance-weighted, CRT-based least-squares solver, yielding MSE within a constant factor of state-of-the-art methods like SS and PGR. The paper provides a theoretical analysis of MSE, DRA bounds, and optimized moduli selection, along with extensive experiments showing MSS matches accuracy while reducing communication and accelerating server-side decoding, and exhibiting the lowest empirical DRA among evaluated protocols. Overall, MSS offers a practical, flexible approach for large-domain LDP frequency estimation, enabling scalable deployment with robust privacy guarantees and efficient computation.
Abstract
We present \textsf{ModularSubsetSelection} (MSS), a new algorithm for locally differentially private (LDP) frequency estimation. Given a universe of size $k$ and $n$ users, our $\varepsilon$-LDP mechanism encodes each input via a Residue Number System (RNS) over $\ell$ pairwise-coprime moduli $m_0, \ldots, m_{\ell-1}$, and reports a randomly chosen index $j \in [\ell]$ along with the perturbed residue using the statistically optimal \textsf{SubsetSelection} (SS) (Wang et al. 2016). This design reduces the user communication cost from $Θ\bigl(ω\log_2(k/ω)\bigr)$ bits required by standard SS (with $ω\approx k/(e^\varepsilon+1)$) down to $\lceil \log_2 \ell \rceil + \lceil \log_2 m_j \rceil$ bits, where $m_j < k$. Server-side decoding runs in $Θ(n + r k \ell)$ time, where $r$ is the number of LSMR (Fong and Saunders 2011) iterations. In practice, with well-conditioned moduli (\textit{i.e.}, constant $r$ and $\ell = Θ(\log k)$), this becomes $Θ(n + k \log k)$. We prove that MSS achieves worst-case MSE within a constant factor of state-of-the-art protocols such as SS and \textsf{ProjectiveGeometryResponse} (PGR) (Feldman et al. 2022) while avoiding the algebraic prerequisites and dynamic-programming decoder required by PGR. Empirically, MSS matches the estimation accuracy of SS, PGR, and \textsf{RAPPOR} (Erlingsson, Pihur, and Korolova 2014) across realistic $(k, \varepsilon)$ settings, while offering faster decoding than PGR and shorter user messages than SS. Lastly, by sampling from multiple moduli and reporting only a single perturbed residue, MSS achieves the lowest reconstruction-attack success rate among all evaluated LDP protocols.
