Table of Contents
Fetching ...

Lightweight Protocols for Distributed Private Quantile Estimation

Anders Aamand, Fabrizio Boninsegna, Abigail Gentle, Jacob Imola, Rasmus Pagh

TL;DR

The paper tackles quantile estimation under local privacy and shuffle privacy in distributed data with one datum per user. It develops a sequentially adaptive LDP median estimator based on noisy binary search and BayeSS, achieving a tight $n = O\left(\frac{\log B}{\varepsilon^2 \alpha^2}\right)$ sample complexity and matching lower bounds in the high-privacy regime, and demonstrates a fundamental separation from non-adaptive approaches. It extends the framework to shuffle-DP with a multi-round batched strategy, achieving near-optimal dependencies on $\log B$, $\alpha$, and $\varepsilon$ while balancing adaptivity and privacy amplification. Experiments show adaptive LDP outperforming non-adaptive baselines, and the results highlight the practical utility of combining adaptivity with shuffling to improve privacy-utility trade-offs in distributed private quantile estimation.

Abstract

Distributed data analysis is a large and growing field driven by a massive proliferation of user devices, and by privacy concerns surrounding the centralised storage of data. We consider two \emph{adaptive} algorithms for estimating one quantile (e.g.~the median) when each user holds a single data point lying in a domain $[B]$ that can be queried once through a private mechanism; one under local differential privacy (LDP) and another for shuffle differential privacy (shuffle-DP). In the adaptive setting we present an $\varepsilon$-LDP algorithm which can estimate any quantile within error $α$ only requiring $O(\frac{\log B}{\varepsilon^2α^2})$ users, and an $(\varepsilon,δ)$-shuffle DP algorithm requiring only $\widetilde{O}((\frac{1}{\varepsilon^2}+\frac{1}{α^2})\log B)$ users. Prior (nonadaptive) algorithms require more users by several logarithmic factors in $B$. We further provide a matching lower bound for adaptive protocols, showing that our LDP algorithm is optimal in the low-$\varepsilon$ regime. Additionally, we establish lower bounds against non-adaptive protocols which paired with our understanding of the adaptive case, proves a fundamental separation between these models.

Lightweight Protocols for Distributed Private Quantile Estimation

TL;DR

The paper tackles quantile estimation under local privacy and shuffle privacy in distributed data with one datum per user. It develops a sequentially adaptive LDP median estimator based on noisy binary search and BayeSS, achieving a tight sample complexity and matching lower bounds in the high-privacy regime, and demonstrates a fundamental separation from non-adaptive approaches. It extends the framework to shuffle-DP with a multi-round batched strategy, achieving near-optimal dependencies on , , and while balancing adaptivity and privacy amplification. Experiments show adaptive LDP outperforming non-adaptive baselines, and the results highlight the practical utility of combining adaptivity with shuffling to improve privacy-utility trade-offs in distributed private quantile estimation.

Abstract

Distributed data analysis is a large and growing field driven by a massive proliferation of user devices, and by privacy concerns surrounding the centralised storage of data. We consider two \emph{adaptive} algorithms for estimating one quantile (e.g.~the median) when each user holds a single data point lying in a domain that can be queried once through a private mechanism; one under local differential privacy (LDP) and another for shuffle differential privacy (shuffle-DP). In the adaptive setting we present an -LDP algorithm which can estimate any quantile within error only requiring users, and an -shuffle DP algorithm requiring only users. Prior (nonadaptive) algorithms require more users by several logarithmic factors in . We further provide a matching lower bound for adaptive protocols, showing that our LDP algorithm is optimal in the low- regime. Additionally, we establish lower bounds against non-adaptive protocols which paired with our understanding of the adaptive case, proves a fundamental separation between these models.

Paper Structure

This paper contains 37 sections, 27 theorems, 79 equations, 8 figures, 1 algorithm.

Key Result

Theorem 1.1

For all $\alpha \in (0,\frac{1}{4})$, and $\varepsilon\in(0,1)$, there exists a sequentially adaptive $\varepsilon$-LDP protocol solving LDPemp-median$(\{x_i\}_{i=1}^n,\alpha,\varepsilon)$ with probability at least $1-\frac{1}{B}$ for any dataset with $n\geq c\frac{\log B}{\varepsilon^2\alpha^2}$ fo

Figures (8)

  • Figure 1: Pareto like data with $n=2500$ and $B=4^9$
  • Figure 2: Uniform data with $n=2500$ and $\varepsilon=1$
  • Figure 4: Experiments to estimate the best constant $c$ to compute $\alpha_{\text{update}}=c\sqrt{\frac{\log B}{n}}$.
  • Figure 5: Experiments run over a dataset obtained by sampling $n$ random integers over a random subset of $[B]$.
  • Figure 7: Comparison analysis.
  • ...and 3 more figures

Theorems & Definitions (48)

  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Definition 2.1
  • Definition 2.2: LDPstat-median
  • Definition 2.3: LDPemp-median
  • Definition 2.4: shuffle-emp-median
  • Lemma 3.1
  • Theorem 3.2
  • ...and 38 more