Private Mean Estimation with Person-Level Differential Privacy

Sushant Agarwal; Gautam Kamath; Mahbod Majid; Argyris Mouzakis; Rose Silver; Jonathan Ullman

Private Mean Estimation with Person-Level Differential Privacy

Sushant Agarwal, Gautam Kamath, Mahbod Majid, Argyris Mouzakis, Rose Silver, Jonathan Ullman

TL;DR

This paper tackles private mean estimation under person-level differential privacy when each person holds multiple samples. It introduces a robust clip-and-noise framework extended to high dimensions via Threaded Clip-and-Noise, and provides tight upper and lower bounds on the number of people needed to estimate the mean within distance α under DP, scaling with dimension d, per-person sample size m, privacy ε, and the moment bound k. The core contributions include tight univariate and multivariate bounds, a novel high-dimensional tail bound for averages of bounded-moment vectors, and a coarse-to-fine strategy that leverages private histograms and iterative refinement to achieve near-optimal sample complexity under approximate-DP, with additional pure-DP results that are computationally harder. The results have implications for federated and privacy-preserving data analysis where individuals contribute multiple data points, clarifying how privacy budgets and tail behavior interact to set feasible data requirements. Overall, the work advances practical private mean estimation with heavy-tailed data, providing both efficient approximate-DP procedures and fundamental limits under DP.

Abstract

We study person-level differentially private (DP) mean estimation in the case where each person holds multiple samples. DP here requires the usual notion of distributional stability when $\textit{all}$ of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that \[n = \tilde Θ\left(\frac{d}{α^2 m} + \frac{d}{αm^{1/2} \varepsilon} + \frac{d}{α^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\] people are necessary and sufficient to estimate the mean up to distance $α$ in $\ell_2$-norm under $\varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the standard clip-and-noise framework, but the analysis for our setting requires both new algorithmic techniques and new analyses. In particular, our new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables may be of interest.

Private Mean Estimation with Person-Level Differential Privacy

TL;DR

Abstract

We study person-level differentially private (DP) mean estimation in the case where each person holds multiple samples. DP here requires the usual notion of distributional stability when

of a person's datapoints can be modified. Informally, if

people each have

samples from an unknown

-dimensional distribution with bounded

-th moments, we show that

people are necessary and sufficient to estimate the mean up to distance

-norm under

-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the standard clip-and-noise framework, but the analysis for our setting requires both new algorithmic techniques and new analyses. In particular, our new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables may be of interest.

Paper Structure (36 sections, 54 theorems, 127 equations, 8 algorithms)

This paper contains 36 sections, 54 theorems, 127 equations, 8 algorithms.

Introduction
Results and Techniques
Technical Overview of Theorems \ref{['thm:one-d-pure']} and \ref{['thm:approx-dp-upperbound-informal']}
Related Work
Independent work of Zhao et al. ZhaoLSLWL24
Preliminaries
Privacy Preliminaries
Bounded Moments
Private Mean Estimation in One Dimension
Preliminaries
Generic Clip and Noise Theorem
Coarse Estimation
Applying Clip and Noise to Distributions with Bounded k-th Moments
Mean Estimation in High Dimensions with Approximate-DP
Using Clip-and-Noise in $d$ dimensions.
...and 21 more sections

Key Result

Theorem 1.1

There is a computationally efficient person-level $\varepsilon$-DP estimator such that for every distribution $\mathcal{D}$ over $\mathbb R$ with mean $\mu$ and bounded $k$-th moments, the estimator takes $m$ samples per person from peopleFor simplicity, throughout this introduction we elide any assumptions that $|\mu| \leqslant R$ and any dependence on this parameter $R$, which is necessary for

Theorems & Definitions (89)

Theorem 1.1: Informal, see Corollary \ref{['cor:mean-estimation-bounded']}
Theorem 1.2: Informal, see Theorem \ref{['thm:approx-dp-upperbound']}
Theorem 1.3: Informal, see Theorem \ref{['thm:mean_est_pure_dp']}
Theorem 1.4: Informal, see Theorem \ref{['thm:main_lb_approx_dp']}
Definition 2.1: Item-Level Differential Privacy
Definition 2.2: Person-Level Differential Privacy
Lemma 2.3: Post-Processing
Lemma 2.4: Basic Composition
Lemma 2.5: Advanced Composition
Definition 2.6
...and 79 more

Private Mean Estimation with Person-Level Differential Privacy

TL;DR

Abstract

Private Mean Estimation with Person-Level Differential Privacy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (89)