Table of Contents
Fetching ...

Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions

Gavin Brown, Samuel B. Hopkins, Adam Smith

TL;DR

The paper tackles covariance-aware differential privacy for high-dimensional subgaussian mean estimation, achieving near-optimal sample complexity and polynomial-time computation. It introduces StableCovariance and StableMean, along with SCORE, to privately infer the mean by focusing on large, outlier-free subsets and ensuring stability across adjacent datasets. By coupling these estimators with a Gaussian sampling mechanism, the authors also obtain private covariance estimation for unrestricted subgaussians and fast private learning of Gaussians in total variation distance. The approach resolves open questions about the trade-off between privacy, accuracy, and computation, delivering both spectral/Frobenius guarantees and practical running times. Overall, the work advances efficient, private learning of high-dimensional Gaussian models beyond prior exponential-time methods.

Abstract

We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation with nearly optimal sample complexity. Only exponential-time estimators were previously known to achieve this guarantee. Given $n$ samples from a (sub-)Gaussian distribution with unknown mean $μ$ and covariance $Σ$, our $(\varepsilon,δ)$-differentially private estimator produces $\tildeμ$ such that $\|μ- \tildeμ\|_Σ \leq α$ as long as $n \gtrsim \tfrac d {α^2} + \tfrac{d \sqrt{\log 1/δ}}{α\varepsilon}+\frac{d\log 1/δ}{\varepsilon}$. The Mahalanobis error metric $\|μ- \hatμ\|_Σ$ measures the distance between $\hat μ$ and $μ$ relative to $Σ$; it characterizes the error of the sample mean. Our algorithm runs in time $\tilde{O}(nd^{ω- 1} + nd/\varepsilon)$, where $ω< 2.38$ is the matrix multiplication exponent. We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above. Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With $n\gtrsim d^{3/2}$ samples, our estimate is accurate in spectral norm. This is the first such algorithm using $n= o(d^2)$ samples, answering an open question posed by Alabi et al. (2022). With $n\gtrsim d^2$ samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance. Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.

Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions

TL;DR

The paper tackles covariance-aware differential privacy for high-dimensional subgaussian mean estimation, achieving near-optimal sample complexity and polynomial-time computation. It introduces StableCovariance and StableMean, along with SCORE, to privately infer the mean by focusing on large, outlier-free subsets and ensuring stability across adjacent datasets. By coupling these estimators with a Gaussian sampling mechanism, the authors also obtain private covariance estimation for unrestricted subgaussians and fast private learning of Gaussians in total variation distance. The approach resolves open questions about the trade-off between privacy, accuracy, and computation, delivering both spectral/Frobenius guarantees and practical running times. Overall, the work advances efficient, private learning of high-dimensional Gaussian models beyond prior exponential-time methods.

Abstract

We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation with nearly optimal sample complexity. Only exponential-time estimators were previously known to achieve this guarantee. Given samples from a (sub-)Gaussian distribution with unknown mean and covariance , our -differentially private estimator produces such that as long as . The Mahalanobis error metric measures the distance between and relative to ; it characterizes the error of the sample mean. Our algorithm runs in time , where is the matrix multiplication exponent. We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above. Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With samples, our estimate is accurate in spectral norm. This is the first such algorithm using samples, answering an open question posed by Alabi et al. (2022). With samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance. Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.
Paper Structure (40 sections, 30 theorems, 89 equations, 1 figure, 1 algorithm)

This paper contains 40 sections, 30 theorems, 89 equations, 1 figure, 1 algorithm.

Key Result

theorem 2

Algorithm alg:main is $(\varepsilon,\delta)$-differentially private. Given samples from a subgaussian distribution on $\mathbb{R}^d$ with mean $\mu$ and covariance $\Sigma$, with high probability it outputs $\tilde{\mu}$ such that $\left\lVert {\tilde{\mu} - \mu} \right\rVert_{\Sigma}\le \alpha$. It runs in time $\tilde{O}{\left( {nd^{\omega-1} + nd/\varepsilon} \right)}$,

Figures (1)

  • Figure 1: (a) A good subset: no points are outliers with respect to the empirical covariance. Removing one point may change the covariance (from $\Sigma$ to $\Sigma'$) and cause points to become (slight) outliers. (b) Let $S_\ell$ denote the largest $\lambda_\ell$-good subset of $x$ and $T_\ell$ the same for adjacent $x'$. Let $\tilde{S}_{\ell}$ denote $S_\ell \setminus{\left\{ {i^*} \right\} }$, where $x,x'$ differ in index $i^*$ (and similarly $\tilde{T}_\ell$). These sets are intertwined: for all $\ell$, we have $\tilde{S}_{\ell}\cup \tilde{T}_{\ell} \subseteq \tilde{S}_{\ell+1}\cap \tilde{T}_{\ell+1}$.

Theorems & Definitions (76)

  • definition 1: Differential Privacy (DP)
  • theorem 2: Informal, see Theorem \ref{['thm:main']}
  • theorem 3: Informal, see Theorem \ref{['thm:private_covariance_estimation']}
  • definition 2
  • lemma 1: Informal
  • lemma 2: Intertwining, Informal
  • proof : sketch
  • theorem 4: Main Theorem
  • lemma 3
  • proof
  • ...and 66 more