Table of Contents
Fetching ...

Private Mean Estimation with Person-Level Differential Privacy

Sushant Agarwal, Gautam Kamath, Mahbod Majid, Argyris Mouzakis, Rose Silver, Jonathan Ullman

TL;DR

This paper tackles private mean estimation under person-level differential privacy when each person holds multiple samples. It introduces a robust clip-and-noise framework extended to high dimensions via Threaded Clip-and-Noise, and provides tight upper and lower bounds on the number of people needed to estimate the mean within distance α under DP, scaling with dimension d, per-person sample size m, privacy ε, and the moment bound k. The core contributions include tight univariate and multivariate bounds, a novel high-dimensional tail bound for averages of bounded-moment vectors, and a coarse-to-fine strategy that leverages private histograms and iterative refinement to achieve near-optimal sample complexity under approximate-DP, with additional pure-DP results that are computationally harder. The results have implications for federated and privacy-preserving data analysis where individuals contribute multiple data points, clarifying how privacy budgets and tail behavior interact to set feasible data requirements. Overall, the work advances practical private mean estimation with heavy-tailed data, providing both efficient approximate-DP procedures and fundamental limits under DP.

Abstract

We study person-level differentially private (DP) mean estimation in the case where each person holds multiple samples. DP here requires the usual notion of distributional stability when $\textit{all}$ of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that \[n = \tilde Θ\left(\frac{d}{α^2 m} + \frac{d}{αm^{1/2} \varepsilon} + \frac{d}{α^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\] people are necessary and sufficient to estimate the mean up to distance $α$ in $\ell_2$-norm under $\varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the standard clip-and-noise framework, but the analysis for our setting requires both new algorithmic techniques and new analyses. In particular, our new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables may be of interest.

Private Mean Estimation with Person-Level Differential Privacy

TL;DR

This paper tackles private mean estimation under person-level differential privacy when each person holds multiple samples. It introduces a robust clip-and-noise framework extended to high dimensions via Threaded Clip-and-Noise, and provides tight upper and lower bounds on the number of people needed to estimate the mean within distance α under DP, scaling with dimension d, per-person sample size m, privacy ε, and the moment bound k. The core contributions include tight univariate and multivariate bounds, a novel high-dimensional tail bound for averages of bounded-moment vectors, and a coarse-to-fine strategy that leverages private histograms and iterative refinement to achieve near-optimal sample complexity under approximate-DP, with additional pure-DP results that are computationally harder. The results have implications for federated and privacy-preserving data analysis where individuals contribute multiple data points, clarifying how privacy budgets and tail behavior interact to set feasible data requirements. Overall, the work advances practical private mean estimation with heavy-tailed data, providing both efficient approximate-DP procedures and fundamental limits under DP.

Abstract

We study person-level differentially private (DP) mean estimation in the case where each person holds multiple samples. DP here requires the usual notion of distributional stability when of a person's datapoints can be modified. Informally, if people each have samples from an unknown -dimensional distribution with bounded -th moments, we show that people are necessary and sufficient to estimate the mean up to distance in -norm under -differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the standard clip-and-noise framework, but the analysis for our setting requires both new algorithmic techniques and new analyses. In particular, our new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables may be of interest.
Paper Structure (36 sections, 54 theorems, 127 equations, 8 algorithms)

This paper contains 36 sections, 54 theorems, 127 equations, 8 algorithms.

Key Result

Theorem 1.1

There is a computationally efficient person-level $\varepsilon$-DP estimator such that for every distribution $\mathcal{D}$ over $\mathbb R$ with mean $\mu$ and bounded $k$-th moments, the estimator takes $m$ samples per person from peopleFor simplicity, throughout this introduction we elide any assumptions that $|\mu| \leqslant R$ and any dependence on this parameter $R$, which is necessary for

Theorems & Definitions (89)

  • Theorem 1.1: Informal, see Corollary \ref{['cor:mean-estimation-bounded']}
  • Theorem 1.2: Informal, see Theorem \ref{['thm:approx-dp-upperbound']}
  • Theorem 1.3: Informal, see Theorem \ref{['thm:mean_est_pure_dp']}
  • Theorem 1.4: Informal, see Theorem \ref{['thm:main_lb_approx_dp']}
  • Definition 2.1: Item-Level Differential Privacy
  • Definition 2.2: Person-Level Differential Privacy
  • Lemma 2.3: Post-Processing
  • Lemma 2.4: Basic Composition
  • Lemma 2.5: Advanced Composition
  • Definition 2.6
  • ...and 79 more