Private Means and the Curious Incident of the Free Lunch
Jack Fitzsimons, James Honaker, Michael Shoemate, Vikrant Singhal
TL;DR
The paper tackles private mean estimation under differential privacy when dataset size is unknown. It introduces the simplex augmentation transformation, which maps each datum to a two-dimensional point on a simplex, allowing the simultaneous private release of two sums with a shared privacy budget. From these, a private count (dataset size) can be recovered for free through post-processing, and the approach extends to weighted means; additional budget can further refine the count via inverse-variance weighting. Empirical results show the simplex method consistently achieves lower variance than standard DP approaches (plugin, centered mean, resize) across multiple distributions and privacy settings, highlighting a practical advance for DP deployments. Overall, the method leverages already-budgeted sensitivity to extract extra information without increasing privacy loss, improving the accuracy of private mean estimates in the unknown-size regime.
Abstract
We show that the most well-known and fundamental building blocks of DP implementations -- sum, mean, count (and many other linear queries) -- can be released with substantially reduced noise for the same privacy guarantee. We achieve this by projecting individual data with worst-case sensitivity $R$ onto a simplex where all data now has a constant norm $R$. In this simplex, additional ``free'' queries can be run that are already covered by the privacy-loss of the original budgeted query, and which algebraically give additional estimates of counts or sums.
