Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

Shyam Narayanan

Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

Shyam Narayanan

TL;DR

The paper derives optimal lower bounds for high-dimensional mean and covariance estimation under approximate differential privacy, achieving spectral-error guarantees. It employs fingerprinting-based lower bounds and a Bayesian prior (Inverse Wishart) over covariance to show that private estimators need at least $ n \\ge ilde{\\Omega}\left( \\frac{d}{\\alpha^2} + \\frac{d^{3/2}}{\\alpha \\\\varepsilon} \\right) $ samples for Gaussian covariance, and $ n \\ge ilde{\\Omega}\left( \\frac{d}{\\alpha^{\\frac{k}{k-1}} \\\\varepsilon} + \\frac{d}{\\alpha^2} \\right) $ for heavy-tailed mean estimation with bounded $k$th moments. The fingerprinting approach yields a clear route to both upper and lower bounds, and the results extend and tighten prior work, including improvements for empirical covariance estimation. A key implication is a dimension-based separation between robustness and privacy: robust spectral covariance estimation can be statistically easier than private spectral covariance estimation in high dimensions. Overall, the findings provide near-optimal, simple-to-implement lower bounds that align with existing upper bounds and sharpen our understanding of privacy-robustness trade-offs in high-dimensional statistical estimation.

Abstract

We provide optimal lower bounds for two well-known parameter estimation (also known as statistical estimation) tasks in high dimensions with approximate differential privacy. First, we prove that for any $α\le O(1)$, estimating the covariance of a Gaussian up to spectral error $α$ requires $\tildeΩ\left(\frac{d^{3/2}}{α\varepsilon} + \frac{d}{α^2}\right)$ samples, which is tight up to logarithmic factors. This result improves over previous work which established this for $α\le O\left(\frac{1}{\sqrt{d}}\right)$, and is also simpler than previous work. Next, we prove that estimating the mean of a heavy-tailed distribution with bounded $k$th moments requires $\tildeΩ\left(\frac{d}{α^{k/(k-1)} \varepsilon} + \frac{d}{α^2}\right)$ samples. Previous work for this problem was only able to establish this lower bound against pure differential privacy, or in the special case of $k = 2$. Our techniques follow the method of fingerprinting and are generally quite simple. Our lower bound for heavy-tailed estimation is based on a black-box reduction from privately estimating identity-covariance Gaussians. Our lower bound for covariance estimation utilizes a Bayesian approach to show that, under an Inverse Wishart prior distribution for the covariance matrix, no private estimator can be accurate even in expectation, without sufficiently many samples.

Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

TL;DR

samples for Gaussian covariance, and

for heavy-tailed mean estimation with bounded

th moments. The fingerprinting approach yields a clear route to both upper and lower bounds, and the results extend and tighten prior work, including improvements for empirical covariance estimation. A key implication is a dimension-based separation between robustness and privacy: robust spectral covariance estimation can be statistically easier than private spectral covariance estimation in high dimensions. Overall, the findings provide near-optimal, simple-to-implement lower bounds that align with existing upper bounds and sharpen our understanding of privacy-robustness trade-offs in high-dimensional statistical estimation.

Abstract

, estimating the covariance of a Gaussian up to spectral error

requires

samples, which is tight up to logarithmic factors. This result improves over previous work which established this for

, and is also simpler than previous work. Next, we prove that estimating the mean of a heavy-tailed distribution with bounded

th moments requires

samples. Previous work for this problem was only able to establish this lower bound against pure differential privacy, or in the special case of

. Our techniques follow the method of fingerprinting and are generally quite simple. Our lower bound for heavy-tailed estimation is based on a black-box reduction from privately estimating identity-covariance Gaussians. Our lower bound for covariance estimation utilizes a Bayesian approach to show that, under an Inverse Wishart prior distribution for the covariance matrix, no private estimator can be accurate even in expectation, without sufficiently many samples.

Paper Structure (23 sections, 28 theorems, 49 equations)

This paper contains 23 sections, 28 theorems, 49 equations.

Introduction
This work
Implications:
Additional related work
Roadmap
Proof Overview
Fingerprinting overview:
Spectral covariance estimation:
Heavy-tailed mean estimation:
Preliminaries
Statistical estimation:
Wishart distributions:
Concentration bounds:
Lower Bound for Private Gaussian Covariance Estimation
Setup and assumptions
...and 8 more sections

Key Result

Theorem 1.2

For any $\alpha, \varepsilon \le O(1),$ and any $\delta \le (\frac{\alpha \cdot \varepsilon}{d})^{O(1)}$, any $(\varepsilon, \delta)$-DP algorithm that solves covariance estimation up to spectral error $\alpha$ for Gaussians in $d$ dimensions requires sample complexity

Theorems & Definitions (49)

Definition 1.1
Theorem 1.2: Informal, see \ref{['thm:covariance-formal']}
Theorem 1.3: Informal, see \ref{['thm:heavy-tailed-formal']}
Lemma 3.1: Folklore
Theorem 3.2
Definition 3.3: Wishart Distribution
Definition 3.4: Inverse Wishart Distribution
Proposition 3.5
Proposition 3.6
Lemma 3.7
...and 39 more

Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

TL;DR

Abstract

Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (49)