Table of Contents
Fetching ...

On Differentially Private U Statistics

Kamalika Chaudhuri, Po-Ling Loh, Shourya Pandey, Purnamrita Sarkar

TL;DR

This work tackles the problem of privately estimating θ = $\mathbb{E}[h(X_1,\dots,X_k)]$ for i.i.d. data using U-statistics under central differential privacy. It identifies limitations of off-the-shelf private mean-estimation methods and introduces a thresholding approach based on local Hájek projections to reweight U-statistic subsets, achieving nearly optimal private error for non-degenerate sub-Gaussian kernels and strong indications of near-optimality for degenerate cases. The authors provide matching lower bounds for non-degenerate kernels and near-optimality evidence for degenerate kernels, along with a subsampling variant that reduces runtime to $O(n^2)$ without sacrificing privacy guarantees. They demonstrate applications to private hypothesis testing and sparse-graph statistics (e.g., uniformity testing, triangle densities), illustrating practical impact in settings where U-statistics naturally arise. Overall, the paper significantly advances private U-statistics by combining a novel Hájek-projection-based reweighting scheme with robust private-mean machinery, yielding practical, near-optimal privacy-utility trade-offs.

Abstract

We consider the problem of privately estimating a parameter $\mathbb{E}[h(X_1,\dots,X_k)]$, where $X_1$, $X_2$, $\dots$, $X_k$ are i.i.d. data from some distribution and $h$ is a permutation-invariant function. Without privacy constraints, standard estimators are U-statistics, which commonly arise in a wide range of problems, including nonparametric signed rank tests, symmetry testing, uniformity testing, and subgraph counts in random networks, and can be shown to be minimum variance unbiased estimators under mild conditions. Despite the recent outpouring of interest in private mean estimation, privatizing U-statistics has received little attention. While existing private mean estimation algorithms can be applied to obtain confidence intervals, we show that they can lead to suboptimal private error, e.g., constant-factor inflation in the leading term, or even $Θ(1/n)$ rather than $O(1/n^2)$ in degenerate settings. To remedy this, we propose a new thresholding-based approach using \emph{local Hájek projections} to reweight different subsets of the data. This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.

On Differentially Private U Statistics

TL;DR

This work tackles the problem of privately estimating θ = for i.i.d. data using U-statistics under central differential privacy. It identifies limitations of off-the-shelf private mean-estimation methods and introduces a thresholding approach based on local Hájek projections to reweight U-statistic subsets, achieving nearly optimal private error for non-degenerate sub-Gaussian kernels and strong indications of near-optimality for degenerate cases. The authors provide matching lower bounds for non-degenerate kernels and near-optimality evidence for degenerate kernels, along with a subsampling variant that reduces runtime to without sacrificing privacy guarantees. They demonstrate applications to private hypothesis testing and sparse-graph statistics (e.g., uniformity testing, triangle densities), illustrating practical impact in settings where U-statistics naturally arise. Overall, the paper significantly advances private U-statistics by combining a novel Hájek-projection-based reweighting scheme with robust private-mean machinery, yielding practical, near-optimal privacy-utility trade-offs.

Abstract

We consider the problem of privately estimating a parameter , where , , , are i.i.d. data from some distribution and is a permutation-invariant function. Without privacy constraints, standard estimators are U-statistics, which commonly arise in a wide range of problems, including nonparametric signed rank tests, symmetry testing, uniformity testing, and subgraph counts in random networks, and can be shown to be minimum variance unbiased estimators under mild conditions. Despite the recent outpouring of interest in private mean estimation, privatizing U-statistics has received little attention. While existing private mean estimation algorithms can be applied to obtain confidence intervals, we show that they can lead to suboptimal private error, e.g., constant-factor inflation in the leading term, or even rather than in degenerate settings. To remedy this, we propose a new thresholding-based approach using \emph{local Hájek projections} to reweight different subsets of the data. This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.
Paper Structure (38 sections, 37 theorems, 181 equations, 2 figures, 1 table, 7 algorithms)

This paper contains 38 sections, 37 theorems, 181 equations, 2 figures, 1 table, 7 algorithms.

Key Result

Lemma 1

Let $\mathcal{A}_i : \mathcal{X}^n \times \prod_{j=1}^{i-1} \mathcal{Y}_i \to \mathcal{Y}_i$ for $i \in [k]$ be $k$ randomized algorithms such that for any $i \in [k]$ and any $(y_1, y_2, \dots, y_{i-1}) \in \prod_{j=1}^{i-1} \mathcal{Y}_j$, the algorithm $\mathcal{A}_i(\cdot, y_1, y_2, \dots, y_{i- where $y_i = \mathcal{A}_i(D, y_1, \dots, y_{i-1})$ for all $i \in [k]$, is $\sum_{i=1}^k \epsilon_

Figures (2)

  • Figure 1: Conditional probability of a triangle given two vertices are $r_n$ distance away.
  • Figure A.1: Weighting scheme in Eq \ref{['eq:weight']}

Theorems & Definitions (67)

  • Lemma 1: Basic composition
  • Lemma 2: Parallel composition
  • Lemma 3: Global sensitivity mechanism dwork2006calibrating
  • Lemma 4: Smoothed sensitivity mechanism nissim2007smooth
  • Definition 1
  • Proposition 1
  • Remark 1
  • Definition 2: All-tuples family
  • Definition 3: Subsampled Family
  • Proposition 2
  • ...and 57 more