Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions

Gavin Brown; Samuel B. Hopkins; Adam Smith

Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions

Gavin Brown, Samuel B. Hopkins, Adam Smith

TL;DR

The paper tackles covariance-aware differential privacy for high-dimensional subgaussian mean estimation, achieving near-optimal sample complexity and polynomial-time computation. It introduces StableCovariance and StableMean, along with SCORE, to privately infer the mean by focusing on large, outlier-free subsets and ensuring stability across adjacent datasets. By coupling these estimators with a Gaussian sampling mechanism, the authors also obtain private covariance estimation for unrestricted subgaussians and fast private learning of Gaussians in total variation distance. The approach resolves open questions about the trade-off between privacy, accuracy, and computation, delivering both spectral/Frobenius guarantees and practical running times. Overall, the work advances efficient, private learning of high-dimensional Gaussian models beyond prior exponential-time methods.

Abstract

We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation with nearly optimal sample complexity. Only exponential-time estimators were previously known to achieve this guarantee. Given $n$ samples from a (sub-)Gaussian distribution with unknown mean $μ$ and covariance $Σ$, our $(\varepsilon,δ)$-differentially private estimator produces $\tildeμ$ such that $\|μ- \tildeμ\|_Σ \leq α$ as long as $n \gtrsim \tfrac d {α^2} + \tfrac{d \sqrt{\log 1/δ}}{α\varepsilon}+\frac{d\log 1/δ}{\varepsilon}$. The Mahalanobis error metric $\|μ- \hatμ\|_Σ$ measures the distance between $\hat μ$ and $μ$ relative to $Σ$; it characterizes the error of the sample mean. Our algorithm runs in time $\tilde{O}(nd^{ω- 1} + nd/\varepsilon)$, where $ω< 2.38$ is the matrix multiplication exponent. We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above. Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With $n\gtrsim d^{3/2}$ samples, our estimate is accurate in spectral norm. This is the first such algorithm using $n= o(d^2)$ samples, answering an open question posed by Alabi et al. (2022). With $n\gtrsim d^2$ samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance. Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.

Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions

TL;DR

Abstract

samples from a (sub-)Gaussian distribution with unknown mean

and covariance

, our

-differentially private estimator produces

such that

as long as

. The Mahalanobis error metric

measures the distance between

and

relative to

; it characterizes the error of the sample mean. Our algorithm runs in time

, where

is the matrix multiplication exponent. We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above. Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With

samples, our estimate is accurate in spectral norm. This is the first such algorithm using

samples, answering an open question posed by Alabi et al. (2022). With

samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance. Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.

Paper Structure (40 sections, 30 theorems, 89 equations, 1 figure, 1 algorithm)

This paper contains 40 sections, 30 theorems, 89 equations, 1 figure, 1 algorithm.

Introduction
Our Techniques
Background: The Empirical Covariance Approach
Overview of Our Approach
Stable Covariance
Good Subsets
Score Function
Good Subsets on Adjacent Data Sets
Families of Largest Good Subsets on Adjacent Datasets
From $\mathtt{StableCovariance}$ to Mean Estimation
Beyond Mean Estimation: Private Covariance Estimation and Learning of Gaussian Distributions
Comparison with a Concurrent Result
Related Work
Gaussian Mean Estimation
Gaussian Covariance Estimation
...and 25 more sections

Key Result

theorem 2

Algorithm alg:main is $(\varepsilon,\delta)$-differentially private. Given samples from a subgaussian distribution on $\mathbb{R}^d$ with mean $\mu$ and covariance $\Sigma$, with high probability it outputs $\tilde{\mu}$ such that $\left\lVert {\tilde{\mu} - \mu} \right\rVert_{\Sigma}\le \alpha$. It runs in time $\tilde{O}{\left( {nd^{\omega-1} + nd/\varepsilon} \right)}$,

Figures (1)

Figure 1: (a) A good subset: no points are outliers with respect to the empirical covariance. Removing one point may change the covariance (from $\Sigma$ to $\Sigma'$) and cause points to become (slight) outliers. (b) Let $S_\ell$ denote the largest $\lambda_\ell$-good subset of $x$ and $T_\ell$ the same for adjacent $x'$. Let $\tilde{S}_{\ell}$ denote $S_\ell \setminus{\left\{ {i^*} \right\} }$, where $x,x'$ differ in index $i^*$ (and similarly $\tilde{T}_\ell$). These sets are intertwined: for all $\ell$, we have $\tilde{S}_{\ell}\cup \tilde{T}_{\ell} \subseteq \tilde{S}_{\ell+1}\cap \tilde{T}_{\ell+1}$.

Theorems & Definitions (76)

definition 1: Differential Privacy (DP)
theorem 2: Informal, see Theorem \ref{['thm:main']}
theorem 3: Informal, see Theorem \ref{['thm:private_covariance_estimation']}
definition 2
lemma 1: Informal
lemma 2: Intertwining, Informal
proof : sketch
theorem 4: Main Theorem
lemma 3
proof
...and 66 more

Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions

TL;DR

Abstract

Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance Estimation for Subgaussian Distributions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (76)