Table of Contents
Fetching ...

On Computing Pairwise Statistics with Local Differential Privacy

Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Adam Sealfon

TL;DR

This work studies privately computing pairwise statistics under local differential privacy by connecting the problem to quadratic forms on histograms and leveraging linear-query private algorithms. It provides a non-interactive local-DP mechanism for quadratic forms with a near-tight error bound $\mathrm{MSE} = O\left( \frac{\zeta(W,n)^2 \log k}{\varepsilon^2 n} \right)$ and derives corresponding lower bounds, along with concrete metrics such as Kendall's $\tau$, AUC, and Gini-based indices. The authors also present an interactive three-round protocol that achieves $\mathrm{MSE} = O\left( \frac{\|W\|_\infty^2}{\varepsilon^2 n} \right)$ for large $n$, proving a separation between interactive and non-interactive local DP for these tasks. By bounding the factorization norm $\gamma_2(W)$ for specific kernels and using projection/JL-based dimensionality reduction, the paper yields actionable bounds for a range of widely used pairwise statistics. The results advance private analytics of pairwise metrics and open questions about removing residual log factors and extending to higher-degree statistics.

Abstract

We study the problem of computing pairwise statistics, i.e., ones of the form $\binom{n}{2}^{-1} \sum_{i \ne j} f(x_i, x_j)$, where $x_i$ denotes the input to the $i$th user, with differential privacy (DP) in the local model. This formulation captures important metrics such as Kendall's $τ$ coefficient, Area Under Curve, Gini's mean difference, Gini's entropy, etc. We give several novel and generic algorithms for the problem, leveraging techniques from DP algorithms for linear queries.

On Computing Pairwise Statistics with Local Differential Privacy

TL;DR

This work studies privately computing pairwise statistics under local differential privacy by connecting the problem to quadratic forms on histograms and leveraging linear-query private algorithms. It provides a non-interactive local-DP mechanism for quadratic forms with a near-tight error bound and derives corresponding lower bounds, along with concrete metrics such as Kendall's , AUC, and Gini-based indices. The authors also present an interactive three-round protocol that achieves for large , proving a separation between interactive and non-interactive local DP for these tasks. By bounding the factorization norm for specific kernels and using projection/JL-based dimensionality reduction, the paper yields actionable bounds for a range of widely used pairwise statistics. The results advance private analytics of pairwise metrics and open questions about removing residual log factors and extending to higher-degree statistics.

Abstract

We study the problem of computing pairwise statistics, i.e., ones of the form , where denotes the input to the th user, with differential privacy (DP) in the local model. This formulation captures important metrics such as Kendall's coefficient, Area Under Curve, Gini's mean difference, Gini's entropy, etc. We give several novel and generic algorithms for the problem, leveraging techniques from DP algorithms for linear queries.

Paper Structure

This paper contains 14 sections, 20 theorems, 27 equations.

Key Result

Theorem 4

For any workload matrix $W$, there is a non-interactive $\varepsilon$-local DP mechanism for linear queries with $\mathrm{mMSE}$ at most $O\left(\frac{\zeta(W, n)^2}{\varepsilon^2 n}\right)$. Furthermore, any non-interactive $\varepsilon$-local DP mechanism must incur $\mathrm{mMSE}$ at least $\tild

Theorems & Definitions (35)

  • Definition 1: Quadratic Form Computation
  • Definition 2: Linear Queries
  • Definition 3: mMSE
  • Theorem 4: EdmondsNU20
  • Theorem 5: Non-interactive Algorithm
  • Theorem 6: Non-interactive Lower Bound
  • Corollary 7
  • Remark
  • Theorem 8: Interactive Algorithm
  • Theorem 9: BlasiokBNS19
  • ...and 25 more