Table of Contents
Fetching ...

Covariance-Aware Private Mean Estimation Without Private Covariance Estimation

Gavin Brown, Marco Gaboardi, Adam Smith, Jonathan Ullman, Lydia Zakynthinou

TL;DR

This work tackles the challenge of differentially private mean estimation in high dimensions when the covariance Σ is unknown. It introduces two sample-efficient estimators that achieve Mahalanobis-distance accuracy ||μ̂−μ||_Σ ≤ α with near-optimal dependence on the dimension and privacy parameters: a Tukey-depth based exponential mechanism restricted to high-depth outputs with a private safety test, and an empirically rescaled Gaussian mechanism that privately perturbs the empirical mean using the empirical covariance without releasing Σ, applicable to subgaussian data. The Tukey-depth approach provides robustness to adversarial corruptions and achieves accuracy under Gaussian assumptions with near-optimal sample complexity; the rescaled Gaussian mechanism replaces σ-scaled spherical noise with covariance-aware noise calibrated to Σ̂, plus a private projection to a “good” data-set class to ensure privacy. Together, these results bypass the costly private covariance estimation bottleneck and yield nearly optimal private mean estimation guarantees for Gaussian and subgaussian data, with clear DP analyses and conditions for practical implementation. The work advances practical differential privacy for high-dimensional statistics by eliminating private covariance estimation as a prerequisite and offering two complementary, theory-backed strategies with distinct robustness and distributional assumptions.

Abstract

We present two sample-efficient differentially private mean estimators for $d$-dimensional (sub)Gaussian distributions with unknown covariance. Informally, given $n \gtrsim d/α^2$ samples from such a distribution with mean $μ$ and covariance $Σ$, our estimators output $\tildeμ$ such that $\| \tildeμ- μ\|_Σ \leq α$, where $\| \cdot \|_Σ$ is the Mahalanobis distance. All previous estimators with the same guarantee either require strong a priori bounds on the covariance matrix or require $Ω(d^{3/2})$ samples. Each of our estimators is based on a simple, general approach to designing differentially private mechanisms, but with novel technical steps to make the estimator private and sample-efficient. Our first estimator samples a point with approximately maximum Tukey depth using the exponential mechanism, but restricted to the set of points of large Tukey depth. Its accuracy guarantees hold even for data sets that have a small amount of adversarial corruption. Proving that this mechanism is private requires a novel analysis. Our second estimator perturbs the empirical mean of the data set with noise calibrated to the empirical covariance, without releasing the covariance itself. Its sample complexity guarantees hold more generally for subgaussian distributions, albeit with a slightly worse dependence on the privacy parameter. For both estimators, careful preprocessing of the data is required to satisfy differential privacy.

Covariance-Aware Private Mean Estimation Without Private Covariance Estimation

TL;DR

This work tackles the challenge of differentially private mean estimation in high dimensions when the covariance Σ is unknown. It introduces two sample-efficient estimators that achieve Mahalanobis-distance accuracy ||μ̂−μ||_Σ ≤ α with near-optimal dependence on the dimension and privacy parameters: a Tukey-depth based exponential mechanism restricted to high-depth outputs with a private safety test, and an empirically rescaled Gaussian mechanism that privately perturbs the empirical mean using the empirical covariance without releasing Σ, applicable to subgaussian data. The Tukey-depth approach provides robustness to adversarial corruptions and achieves accuracy under Gaussian assumptions with near-optimal sample complexity; the rescaled Gaussian mechanism replaces σ-scaled spherical noise with covariance-aware noise calibrated to Σ̂, plus a private projection to a “good” data-set class to ensure privacy. Together, these results bypass the costly private covariance estimation bottleneck and yield nearly optimal private mean estimation guarantees for Gaussian and subgaussian data, with clear DP analyses and conditions for practical implementation. The work advances practical differential privacy for high-dimensional statistics by eliminating private covariance estimation as a prerequisite and offering two complementary, theory-backed strategies with distinct robustness and distributional assumptions.

Abstract

We present two sample-efficient differentially private mean estimators for -dimensional (sub)Gaussian distributions with unknown covariance. Informally, given samples from such a distribution with mean and covariance , our estimators output such that , where is the Mahalanobis distance. All previous estimators with the same guarantee either require strong a priori bounds on the covariance matrix or require samples. Each of our estimators is based on a simple, general approach to designing differentially private mechanisms, but with novel technical steps to make the estimator private and sample-efficient. Our first estimator samples a point with approximately maximum Tukey depth using the exponential mechanism, but restricted to the set of points of large Tukey depth. Its accuracy guarantees hold even for data sets that have a small amount of adversarial corruption. Proving that this mechanism is private requires a novel analysis. Our second estimator perturbs the empirical mean of the data set with noise calibrated to the empirical covariance, without releasing the covariance itself. Its sample complexity guarantees hold more generally for subgaussian distributions, albeit with a slightly worse dependence on the privacy parameter. For both estimators, careful preprocessing of the data is required to satisfy differential privacy.

Paper Structure

This paper contains 34 sections, 56 theorems, 161 equations, 8 algorithms.

Key Result

Theorem 1.1

For $\alpha \leq 1$, there is an $(\varepsilon,\delta)$-differentially private estimator $\mathcal{A}(\cdot)$ such that if $x = (x_1,\dots,x_n)$ are sampled from $\mathcal{N}(\mu,\Sigma)$ for unknown $\mu$ and $\Sigma$ of full rank, The above guarantee holds with high probability over the sample $x$ and the randomness of $\mathcal{A}$. Here $\gtrsim$ hides a universal multiplicative constant and

Theorems & Definitions (99)

  • Theorem 1.1: Informal
  • Theorem 1.2: Informal
  • Definition 2.1: $(\varepsilon,\delta)$-indistinguishability
  • Definition 2.2: Differential Privacy DworkMNS06
  • Lemma 2.3: Composition DworkMNS06
  • Definition 2.4: Laplace Mechanism DworkMNS06
  • Lemma 2.5: DworkMNS06
  • Definition 2.6: Gaussian Mechanism, DworkMNS06
  • Lemma 2.7: DworkMNS06
  • Definition 3.1: Safety
  • ...and 89 more