A Bias-Accuracy-Privacy Trilemma for Statistical Estimation

Gautam Kamath; Argyris Mouzakis; Matthew Regehr; Vikrant Singhal; Thomas Steinke; Jonathan Ullman

A Bias-Accuracy-Privacy Trilemma for Statistical Estimation

Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal, Thomas Steinke, Jonathan Ullman

TL;DR

The paper investigates the fundamental bias tradeoffs in mean estimation under differential privacy, showing that clipping-based private algorithms incur unavoidable bias and cannot achieve low bias, low error, and strong privacy simultaneously for arbitrary distributions. It introduces a bias-accuracy-privacy trilemma and proves a quantitative lower bound on MSE that complements known upper bounds for noisy clipped-mean estimators, highlighting inherent privacy-induced bias. Beyond the negative results, it demonstrates positive outcomes under approximate DP for symmetric distributions, providing an unbiased private mean estimator with a concrete MSE bound, and proves impossibility results for unbiasedness under pure DP. The work introduces two proof techniques—fingerprinting-based lower bounds and privacy amplification by shuffling—and offers a general-purpose low-bias estimator that blends noisy clipping with tail corrections, broadening the toolkit for private statistical estimation with practical guidance on when unbiased estimates are feasible. Overall, the results clarify the limits of DP mean estimation, guide mechanism design, and deepen understanding of how distributional symmetry and moment assumptions affect bias and privacy-utility tradeoffs.

Abstract

Differential privacy (DP) is a rigorous notion of data privacy, used for private statistics. The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clipping controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clipping also introduces statistical bias. This tradeoff is inherent: we prove that no algorithm can simultaneously have low bias, low error, and low privacy loss for arbitrary distributions. Additionally, we show that under strong notions of DP (i.e., pure or concentrated DP), unbiased mean estimation is impossible, even if we assume that the data is sampled from a Gaussian. On the positive side, we show that unbiased mean estimation is possible under a more permissive notion of differential privacy (approximate DP) if we assume that the distribution is symmetric.

A Bias-Accuracy-Privacy Trilemma for Statistical Estimation

TL;DR

Abstract

Paper Structure (13 sections, 9 theorems, 33 equations, 2 figures, 1 table)

This paper contains 13 sections, 9 theorems, 33 equations, 2 figures, 1 table.

Introduction
Differential Privacy
Contributions
Overview of Results
Our Techniques
Negative Results via the Fingerprinting Method (Theorem \ref{['thm:main_trilemma_informal']}, §\ref{['sec:trilemma']}).
Negative Results via Amplification (Theorem \ref{['thm:main_trilemma_informal']} Revisited, §\ref{['subsec:amplification']}).
General-Purpose Low-Bias Mean Estimation (Theorem \ref{['thm:eps_delta_ub_mix']}, §\ref{['sec:low_bias']}).
Unbiased Mean Estimation for Symmetric Distributions (Theorem \ref{['thm:positive_unbiased']}, §\ref{['sec:symmetric']}).
Negative Result for Pure DP Unbiased Mean Estimation (Theorem \ref{['thm:main_packing']}, §\ref{['sec:analyticity']}).
Related Work
Main Bias-Accuracy-Privacy Trilemma
Negative Result via Fingerprinting

Key Result

Theorem 1.2

Let $M : \mathbb{R}^n \to \mathbb{R}$ be an $(\varepsilon,\delta)$-DP algorithm, for some $\varepsilon,\delta$ satisfyingThis assumption is natural because $\varepsilon = \Theta(1)$ and $\delta = o(1/n)$ is the standard DP parameter regime.$0<\delta \leq \varepsilon^2 / 200 \leq 1$. Suppose $M$ sati Then

Figures (2)

Figure 1: $n = 500$ samples are drawn from the log-normal distribution with median 60,000 and variance 1. A silhouette of the distribution is lightly shaded (y-axis not to scale). The data are clipped at a specified threshold $T$ and Laplace noise with parameter $\frac{T}{\varepsilon n}$ is added. The statistical bias due to clipping (dotted) and standard error introduced due to the noise and sampling (dashed) are plotted, as well as the root mean squared error (RMSE, solid). To minimize the RMSE, the clipping threshold must be chosen judiciously to balance the bias and the noise, and at a different value depending on the privacy parameter $\varepsilon$.
Figure 2: Bias, standard error, and root-mean-square error of the Laplace mechanism on the University of California Report on 2011 Employee Pay UCal2011. A histogram of the dataset is indicated by the shaded region.

Theorems & Definitions (12)

Definition 1.1: Differential Privacy (DP) DworkMNS06
Theorem 1.2: Bias-Accuracy-Privacy Trilemma
Theorem 1.3: Tightness of Bias-Accuracy-Privacy Trilemma
Theorem 1.4: Unbiased Private Mean Estimation for Symmetric Distributions
Theorem 1.5: Impossibilty of Unbiased Estimators under Pure DP
Theorem 2.1: Bias-Accuracy-Privacy Tradeoff
Lemma 2.2: Setting $\tau=\delta^{1-1/\kappa}$ in Theorem \ref{['thm:flpb']}
proof : Proof of Lemma \ref{['lem:2nd_cond_implies_3rd']}
Corollary 2.3: Combining Theorem \ref{['thm:flpb']}, Lemma \ref{['lem:2nd_cond_implies_3rd']} (with $\kappa=2$), and Equation \ref{['eq:optgamma']}.
Theorem 2.4: Setting $\lambda=2$ in Corollary \ref{['cor:optgamma']} to get Theorem \ref{['thm:main_trilemma_informal']}
...and 2 more

A Bias-Accuracy-Privacy Trilemma for Statistical Estimation

TL;DR

Abstract

A Bias-Accuracy-Privacy Trilemma for Statistical Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (12)