Table of Contents
Fetching ...

CoinPress: Practical Private Mean and Covariance Estimation

Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman

TL;DR

The work addresses privately estimating the mean $\mu$ and covariance $\Sigma$ of a multivariate sub-Gaussian distribution, especially in small-sample regimes. It introduces CoinPress, an iterative confidence-ball method that clips data, adds calibrated noise, and tightens the feasible region to produce private estimates with competitive error. The authors prove that their estimators achieve state-of-the-art asymptotic rates and demonstrate strong empirical performance on synthetic and real data, with mean-estimation improvements over prior univariate methods when applied multivariately. The approach reduces reliance on strong priors and provides a scalable, practical private estimation pipeline for high-dimensional settings.

Abstract

We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data that are accurate at small sample sizes. We demonstrate the effectiveness of our algorithms both theoretically and empirically using synthetic and real-world datasets -- showing that their asymptotic error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods. Specifically, previous estimators either have weak empirical accuracy at small sample sizes, perform poorly for multivariate data, or require the user to provide strong a priori estimates for the parameters.

CoinPress: Practical Private Mean and Covariance Estimation

TL;DR

The work addresses privately estimating the mean and covariance of a multivariate sub-Gaussian distribution, especially in small-sample regimes. It introduces CoinPress, an iterative confidence-ball method that clips data, adds calibrated noise, and tightens the feasible region to produce private estimates with competitive error. The authors prove that their estimators achieve state-of-the-art asymptotic rates and demonstrate strong empirical performance on synthetic and real data, with mean-estimation improvements over prior univariate methods when applied multivariately. The approach reduces reliance on strong priors and provides a scalable, practical private estimation pipeline for high-dimensional settings.

Abstract

We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data that are accurate at small sample sizes. We demonstrate the effectiveness of our algorithms both theoretically and empirically using synthetic and real-world datasets -- showing that their asymptotic error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods. Specifically, previous estimators either have weak empirical accuracy at small sample sizes, perform poorly for multivariate data, or require the user to provide strong a priori estimates for the parameters.

Paper Structure

This paper contains 5 sections, 6 equations, 2 figures.

Figures (2)

  • Figure 1: The cost of privacy, measured as the ratio of our iterative estimator's error to that of the non-private estimator. For mean estimation (left) we use $d = 50$ and privacy level $\rho = 0.5$ and vary $n \in (300,5000)$. For covariance estimation (right) we use $d = 10$ privacy level $\rho = 0.5$ and vary $n \in (2000,10000)$.
  • Figure 2: Visualizing a run of the mean estimator with $n = 160, \rho = 0.1, t = 3$. The data is represented by the blue dots, the black circles represent the iteratively shrinking confidence ball, and the orange dot is the final private mean estimate.

Theorems & Definitions (3)

  • Definition 1.1: zCDP
  • Definition 2.1: Differential Privacy (DP) DworkMNS06
  • Definition 2.2: Concentrated Differential Privacy (zCDP) BunS16