On Computing Pairwise Statistics with Local Differential Privacy
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Adam Sealfon
TL;DR
This work studies privately computing pairwise statistics under local differential privacy by connecting the problem to quadratic forms on histograms and leveraging linear-query private algorithms. It provides a non-interactive local-DP mechanism for quadratic forms with a near-tight error bound $\mathrm{MSE} = O\left( \frac{\zeta(W,n)^2 \log k}{\varepsilon^2 n} \right)$ and derives corresponding lower bounds, along with concrete metrics such as Kendall's $\tau$, AUC, and Gini-based indices. The authors also present an interactive three-round protocol that achieves $\mathrm{MSE} = O\left( \frac{\|W\|_\infty^2}{\varepsilon^2 n} \right)$ for large $n$, proving a separation between interactive and non-interactive local DP for these tasks. By bounding the factorization norm $\gamma_2(W)$ for specific kernels and using projection/JL-based dimensionality reduction, the paper yields actionable bounds for a range of widely used pairwise statistics. The results advance private analytics of pairwise metrics and open questions about removing residual log factors and extending to higher-degree statistics.
Abstract
We study the problem of computing pairwise statistics, i.e., ones of the form $\binom{n}{2}^{-1} \sum_{i \ne j} f(x_i, x_j)$, where $x_i$ denotes the input to the $i$th user, with differential privacy (DP) in the local model. This formulation captures important metrics such as Kendall's $τ$ coefficient, Area Under Curve, Gini's mean difference, Gini's entropy, etc. We give several novel and generic algorithms for the problem, leveraging techniques from DP algorithms for linear queries.
