Lightweight Protocols for Distributed Private Quantile Estimation
Anders Aamand, Fabrizio Boninsegna, Abigail Gentle, Jacob Imola, Rasmus Pagh
TL;DR
The paper tackles quantile estimation under local privacy and shuffle privacy in distributed data with one datum per user. It develops a sequentially adaptive LDP median estimator based on noisy binary search and BayeSS, achieving a tight $n = O\left(\frac{\log B}{\varepsilon^2 \alpha^2}\right)$ sample complexity and matching lower bounds in the high-privacy regime, and demonstrates a fundamental separation from non-adaptive approaches. It extends the framework to shuffle-DP with a multi-round batched strategy, achieving near-optimal dependencies on $\log B$, $\alpha$, and $\varepsilon$ while balancing adaptivity and privacy amplification. Experiments show adaptive LDP outperforming non-adaptive baselines, and the results highlight the practical utility of combining adaptivity with shuffling to improve privacy-utility trade-offs in distributed private quantile estimation.
Abstract
Distributed data analysis is a large and growing field driven by a massive proliferation of user devices, and by privacy concerns surrounding the centralised storage of data. We consider two \emph{adaptive} algorithms for estimating one quantile (e.g.~the median) when each user holds a single data point lying in a domain $[B]$ that can be queried once through a private mechanism; one under local differential privacy (LDP) and another for shuffle differential privacy (shuffle-DP). In the adaptive setting we present an $\varepsilon$-LDP algorithm which can estimate any quantile within error $α$ only requiring $O(\frac{\log B}{\varepsilon^2α^2})$ users, and an $(\varepsilon,δ)$-shuffle DP algorithm requiring only $\widetilde{O}((\frac{1}{\varepsilon^2}+\frac{1}{α^2})\log B)$ users. Prior (nonadaptive) algorithms require more users by several logarithmic factors in $B$. We further provide a matching lower bound for adaptive protocols, showing that our LDP algorithm is optimal in the low-$\varepsilon$ regime. Additionally, we establish lower bounds against non-adaptive protocols which paired with our understanding of the adaptive case, proves a fundamental separation between these models.
