PLAN: Variance-Aware Private Mean Estimation
Martin Aumüller, Christian Janos Lebeda, Boel Nelson, Rasmus Pagh
TL;DR
PLAN (Private Limit Adapted Noise) addresses the challenge of differentially private mean estimation in high dimensions by exploiting structure in the coordinate-wise variances. It adaptively shapes noise via a coordinate-wise scaling by $\hat{\boldsymbol{\sigma}}^{-1/(p+2)}$, privately estimates a clipping threshold through PrivQuantile, and combines clipping with Gaussian noise to achieve error that scales with $\|\boldsymbol{\sigma}\|_1$ (for $\ell_2$) under $\boldsymbol{\sigma}$-well concentrated distributions. The analysis decomposes the utility into the bias from private centering, clipping error, and noise, yielding bounds such as $\mathbb{E}[\|\tilde{\boldsymbol{\mu}}-\boldsymbol{\mu}\|_2^2] = \tilde{O}(1 + \|\boldsymbol{\sigma}\|_2/\sqrt{n} + \|\boldsymbol{\sigma}\|_1/(n\sqrt{\rho}))$ and a general $\ell_p$ analogue, while remaining competitive even without precise variance estimates. Empirically, PLAN shows improvements over state-of-the-art methods in skewed-variance regimes on both synthetic and real datasets, while remaining robust in less skewed scenarios. The work provides a practical, data-aware alternative to worst-case private mean estimation, with guidance for variance estimation, clipping budgets, and practical parameter choices.
Abstract
Differentially private mean estimation is an important building block in privacy-preserving algorithms for data analysis and machine learning. Though the trade-off between privacy and utility is well understood in the worst case, many datasets exhibit structure that could potentially be exploited to yield better algorithms. In this paper we present $\textit{Private Limit Adapted Noise}$ (PLAN), a family of differentially private algorithms for mean estimation in the setting where inputs are independently sampled from a distribution $\mathcal{D}$ over $\mathbf{R}^d$, with coordinate-wise standard deviations $\boldsymbolσ \in \mathbf{R}^d$. Similar to mean estimation under Mahalanobis distance, PLAN tailors the shape of the noise to the shape of the data, but unlike previous algorithms the privacy budget is spent non-uniformly over the coordinates. Under a concentration assumption on $\mathcal{D}$, we show how to exploit skew in the vector $\boldsymbolσ$, obtaining a (zero-concentrated) differentially private mean estimate with $\ell_2$ error proportional to $\|\boldsymbolσ\|_1$. Previous work has either not taken $\boldsymbolσ$ into account, or measured error in Mahalanobis distance $\unicode{x2013}$ in both cases resulting in $\ell_2$ error proportional to $\sqrt{d}\|\boldsymbolσ\|_2$, which can be up to a factor $\sqrt{d}$ larger. To verify the effectiveness of PLAN, we empirically evaluate accuracy on both synthetic and real world data.
