Table of Contents
Fetching ...

Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

Shyam Narayanan

TL;DR

The paper derives optimal lower bounds for high-dimensional mean and covariance estimation under approximate differential privacy, achieving spectral-error guarantees. It employs fingerprinting-based lower bounds and a Bayesian prior (Inverse Wishart) over covariance to show that private estimators need at least $ n \\ge ilde{\\Omega}\left( \\frac{d}{\\alpha^2} + \\frac{d^{3/2}}{\\alpha \\\\varepsilon} \\right) $ samples for Gaussian covariance, and $ n \\ge ilde{\\Omega}\left( \\frac{d}{\\alpha^{\\frac{k}{k-1}} \\\\varepsilon} + \\frac{d}{\\alpha^2} \\right) $ for heavy-tailed mean estimation with bounded $k$th moments. The fingerprinting approach yields a clear route to both upper and lower bounds, and the results extend and tighten prior work, including improvements for empirical covariance estimation. A key implication is a dimension-based separation between robustness and privacy: robust spectral covariance estimation can be statistically easier than private spectral covariance estimation in high dimensions. Overall, the findings provide near-optimal, simple-to-implement lower bounds that align with existing upper bounds and sharpen our understanding of privacy-robustness trade-offs in high-dimensional statistical estimation.

Abstract

We provide optimal lower bounds for two well-known parameter estimation (also known as statistical estimation) tasks in high dimensions with approximate differential privacy. First, we prove that for any $α\le O(1)$, estimating the covariance of a Gaussian up to spectral error $α$ requires $\tildeΩ\left(\frac{d^{3/2}}{α\varepsilon} + \frac{d}{α^2}\right)$ samples, which is tight up to logarithmic factors. This result improves over previous work which established this for $α\le O\left(\frac{1}{\sqrt{d}}\right)$, and is also simpler than previous work. Next, we prove that estimating the mean of a heavy-tailed distribution with bounded $k$th moments requires $\tildeΩ\left(\frac{d}{α^{k/(k-1)} \varepsilon} + \frac{d}{α^2}\right)$ samples. Previous work for this problem was only able to establish this lower bound against pure differential privacy, or in the special case of $k = 2$. Our techniques follow the method of fingerprinting and are generally quite simple. Our lower bound for heavy-tailed estimation is based on a black-box reduction from privately estimating identity-covariance Gaussians. Our lower bound for covariance estimation utilizes a Bayesian approach to show that, under an Inverse Wishart prior distribution for the covariance matrix, no private estimator can be accurate even in expectation, without sufficiently many samples.

Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

TL;DR

The paper derives optimal lower bounds for high-dimensional mean and covariance estimation under approximate differential privacy, achieving spectral-error guarantees. It employs fingerprinting-based lower bounds and a Bayesian prior (Inverse Wishart) over covariance to show that private estimators need at least samples for Gaussian covariance, and for heavy-tailed mean estimation with bounded th moments. The fingerprinting approach yields a clear route to both upper and lower bounds, and the results extend and tighten prior work, including improvements for empirical covariance estimation. A key implication is a dimension-based separation between robustness and privacy: robust spectral covariance estimation can be statistically easier than private spectral covariance estimation in high dimensions. Overall, the findings provide near-optimal, simple-to-implement lower bounds that align with existing upper bounds and sharpen our understanding of privacy-robustness trade-offs in high-dimensional statistical estimation.

Abstract

We provide optimal lower bounds for two well-known parameter estimation (also known as statistical estimation) tasks in high dimensions with approximate differential privacy. First, we prove that for any , estimating the covariance of a Gaussian up to spectral error requires samples, which is tight up to logarithmic factors. This result improves over previous work which established this for , and is also simpler than previous work. Next, we prove that estimating the mean of a heavy-tailed distribution with bounded th moments requires samples. Previous work for this problem was only able to establish this lower bound against pure differential privacy, or in the special case of . Our techniques follow the method of fingerprinting and are generally quite simple. Our lower bound for heavy-tailed estimation is based on a black-box reduction from privately estimating identity-covariance Gaussians. Our lower bound for covariance estimation utilizes a Bayesian approach to show that, under an Inverse Wishart prior distribution for the covariance matrix, no private estimator can be accurate even in expectation, without sufficiently many samples.
Paper Structure (23 sections, 28 theorems, 49 equations)

This paper contains 23 sections, 28 theorems, 49 equations.

Key Result

Theorem 1.2

For any $\alpha, \varepsilon \le O(1),$ and any $\delta \le (\frac{\alpha \cdot \varepsilon}{d})^{O(1)}$, any $(\varepsilon, \delta)$-DP algorithm that solves covariance estimation up to spectral error $\alpha$ for Gaussians in $d$ dimensions requires sample complexity

Theorems & Definitions (49)

  • Definition 1.1
  • Theorem 1.2: Informal, see \ref{['thm:covariance-formal']}
  • Theorem 1.3: Informal, see \ref{['thm:heavy-tailed-formal']}
  • Lemma 3.1: Folklore
  • Theorem 3.2
  • Definition 3.3: Wishart Distribution
  • Definition 3.4: Inverse Wishart Distribution
  • Proposition 3.5
  • Proposition 3.6
  • Lemma 3.7
  • ...and 39 more