A Bias-Accuracy-Privacy Trilemma for Statistical Estimation
Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal, Thomas Steinke, Jonathan Ullman
TL;DR
The paper investigates the fundamental bias tradeoffs in mean estimation under differential privacy, showing that clipping-based private algorithms incur unavoidable bias and cannot achieve low bias, low error, and strong privacy simultaneously for arbitrary distributions. It introduces a bias-accuracy-privacy trilemma and proves a quantitative lower bound on MSE that complements known upper bounds for noisy clipped-mean estimators, highlighting inherent privacy-induced bias. Beyond the negative results, it demonstrates positive outcomes under approximate DP for symmetric distributions, providing an unbiased private mean estimator with a concrete MSE bound, and proves impossibility results for unbiasedness under pure DP. The work introduces two proof techniques—fingerprinting-based lower bounds and privacy amplification by shuffling—and offers a general-purpose low-bias estimator that blends noisy clipping with tail corrections, broadening the toolkit for private statistical estimation with practical guidance on when unbiased estimates are feasible. Overall, the results clarify the limits of DP mean estimation, guide mechanism design, and deepen understanding of how distributional symmetry and moment assumptions affect bias and privacy-utility tradeoffs.
Abstract
Differential privacy (DP) is a rigorous notion of data privacy, used for private statistics. The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clipping controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clipping also introduces statistical bias. This tradeoff is inherent: we prove that no algorithm can simultaneously have low bias, low error, and low privacy loss for arbitrary distributions. Additionally, we show that under strong notions of DP (i.e., pure or concentrated DP), unbiased mean estimation is impossible, even if we assume that the data is sampled from a Gaussian. On the positive side, we show that unbiased mean estimation is possible under a more permissive notion of differential privacy (approximate DP) if we assume that the distribution is symmetric.
