Private Count Release: A Simple and Scalable Approach for Private Data Analytics

Ryan Rogers

Private Count Release: A Simple and Scalable Approach for Private Data Analytics

Ryan Rogers

TL;DR

The paper addresses the challenge of scalable, accurate count release under differential privacy with minimal onboarding. It introduces Private Count Release (PCR), which leverages Unknown Domain Gumbel to adaptively identify and privatize top counts without relying on $\ell_0$-sensitivity bounds, while maintaining $\delta$-approximate $\rho$-CDP and DP guarantees. The approach is evaluated on multiple public datasets (Finance, Reddit, Wikipedia, MovieLens), showing PCR can achieve high recall with controlled relative error and reduced hyperparameter tuning compared to percentile-bound approaches like Plume. The work highlights practical implications for deploying DP in real-world analytics pipelines, offering a more scalable, blackbox-friendly alternative and outlining avenues for further improvement, such as noise-reduction techniques and extension to additional aggregations.

Abstract

We present a data analytics system that ensures accurate counts can be released with differential privacy and minimal onboarding effort while showing instances that outperform other approaches that require more onboarding effort. The primary difference between our proposal and existing approaches is that it does not rely on user contribution bounds over distinct elements, i.e. $\ell_0$-sensitivity bounds, which can significantly bias counts. Contribution bounds for $\ell_0$-sensitivity have been considered as necessary to ensure differential privacy, but we show that this is actually not necessary and can lead to releasing more results that are more accurate. We require minimal hyperparameter tuning and demonstrate results on several publicly available dataset. We hope that this approach will help differential privacy scale to many different data analytics applications.

Private Count Release: A Simple and Scalable Approach for Private Data Analytics

TL;DR

-sensitivity bounds, while maintaining

-approximate

-CDP and DP guarantees. The approach is evaluated on multiple public datasets (Finance, Reddit, Wikipedia, MovieLens), showing PCR can achieve high recall with controlled relative error and reduced hyperparameter tuning compared to percentile-bound approaches like Plume. The work highlights practical implications for deploying DP in real-world analytics pipelines, offering a more scalable, blackbox-friendly alternative and outlining avenues for further improvement, such as noise-reduction techniques and extension to additional aggregations.

Abstract

-sensitivity bounds, which can significantly bias counts. Contribution bounds for

-sensitivity have been considered as necessary to ensure differential privacy, but we show that this is actually not necessary and can lead to releasing more results that are more accurate. We require minimal hyperparameter tuning and demonstrate results on several publicly available dataset. We hope that this approach will help differential privacy scale to many different data analytics applications.

Paper Structure (9 sections, 6 theorems, 8 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 9 sections, 6 theorems, 8 equations, 5 figures, 1 table, 2 algorithms.

Introduction
Preliminaries
Private Count Release
Results
Finance:
Reddit subsample and full data:
Wikipedia:
Movie Lens:
Conclusion

Key Result

Theorem 1

If $A$ is $(\varepsilon, \delta)$-DP then it is $\delta$-approximate $\varepsilon^2/2$-CDP. If $A$ is $\delta$-approximate $\rho$-CDP then it is $(\rho + 2\sqrt{\rho \log(1/\delta')}, \delta' + \delta)$-DP for any $\delta'>0$.

Figures (5)

Figure 1: Results for three approaches: PCR, Plume, and Plume with Threshold on the Finance data. We show recall and precision for $\rho \in \{0.1, 0.5, 1.0 \}$ and $\delta = 10^{-6}$ averaged over 10 independent trials.
Figure 2: Results for three approaches: PCR, Plume, and Plume with Threshold on the Reddit comments subsample data. We show recall and precision for $\rho \in \{0.1, 0.5, 1.0 \}$ and $\delta = 10^{-6}$ averaged over 10 independent trials. The top plots use the true 95th-percentile for contribution bounding in Plume and the bottom plots use the true 99th-percentile for contribution bounding in Plume.
Figure 3: Results for three approaches: PCR, Plume, and Plume with Threshold on the Reddit comments full data. We show recall and precision for $\rho \in \{0.1, 0.5, 1.0 \}$ and $\delta = 10^{-6}$ averaged over 10 independent trials. The top plots use the true 95th-percentile for contribution bounding in Plume and the bottom plots use the true 99th-percentile for contribution bounding in Plume.
Figure 4: Results for three approaches: PCR, Plume, and Plume with Threshold on the Wikipedia data. We show recall and precision for $\rho \in \{0.1, 0.5, 1.0 \}$ and $\delta = 10^{-6}$ averaged over 10 independent trials. The top plots use the true 95th-percentile for contribution bounding in Plume and the bottom plots use the true 99th-percentile for contribution bounding in Plume.
Figure 5: Results for three approaches: PCR, Plume, and Plume with Threshold on the MovieLens data. We show recall and precision for $\rho \in \{0.1, 0.5, 1.0 \}$ and $\delta = 10^{-6}$ averaged over 10 independent trials. The top plots use the true 95th-percentile for contribution bounding in Plume and the bottom plots use the true 99th-percentile for contribution bounding in Plume.

Theorems & Definitions (9)

Definition 2.1: DworkMcNiSm06DworkKeMcMiNa06
Definition 2.2: BunSt16PapernotSt22
Theorem 1: BunSt16
Theorem 2: BunSt16
Theorem 3
Theorem 4: DurfeeRo19Rogers23
Theorem 5: WhitehouseRaRoWu23
Theorem 6
proof

Private Count Release: A Simple and Scalable Approach for Private Data Analytics

TL;DR

Abstract

Private Count Release: A Simple and Scalable Approach for Private Data Analytics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (9)