Private Count Release: A Simple and Scalable Approach for Private Data Analytics
Ryan Rogers
TL;DR
The paper addresses the challenge of scalable, accurate count release under differential privacy with minimal onboarding. It introduces Private Count Release (PCR), which leverages Unknown Domain Gumbel to adaptively identify and privatize top counts without relying on $\ell_0$-sensitivity bounds, while maintaining $\delta$-approximate $\rho$-CDP and DP guarantees. The approach is evaluated on multiple public datasets (Finance, Reddit, Wikipedia, MovieLens), showing PCR can achieve high recall with controlled relative error and reduced hyperparameter tuning compared to percentile-bound approaches like Plume. The work highlights practical implications for deploying DP in real-world analytics pipelines, offering a more scalable, blackbox-friendly alternative and outlining avenues for further improvement, such as noise-reduction techniques and extension to additional aggregations.
Abstract
We present a data analytics system that ensures accurate counts can be released with differential privacy and minimal onboarding effort while showing instances that outperform other approaches that require more onboarding effort. The primary difference between our proposal and existing approaches is that it does not rely on user contribution bounds over distinct elements, i.e. $\ell_0$-sensitivity bounds, which can significantly bias counts. Contribution bounds for $\ell_0$-sensitivity have been considered as necessary to ensure differential privacy, but we show that this is actually not necessary and can lead to releasing more results that are more accurate. We require minimal hyperparameter tuning and demonstrate results on several publicly available dataset. We hope that this approach will help differential privacy scale to many different data analytics applications.
