Improving the Variance of Differentially Private Randomized Experiments through Clustering
Adel Javanmard, Vahab Mirrokni, Jean Pouget-Abadie
TL;DR
This work tackles the problem of estimating causal effects under differential privacy by exploiting non-private clustering structure to reduce the variance penalty induced by DP noise. It introduces Cluster-DP, a cluster-aware DP mechanism, and its Cluster-Free variant, along with an efficient unbiased estimator $\hat{\tau}_Q$ that debiases privatized outcomes using the inverse of a cluster-specific routing matrix $Q_{c,a}$. Theoretical results establish precise DP guarantees and variance bounds that depend on cluster quality via hydro-metric terms $\\phi_a$, demonstrating improved privacy-utility trade-offs when clusters are homogeneous. Empirically, Cluster-DP consistently achieves lower variance than baselines (Uniform-Prior-DP and Cluster-Free-DP) across synthetic models and a real YouTube network dataset, highlighting practical impact for privacy-preserving causal analysis in ads and other clustered settings.
Abstract
Estimating causal effects from randomized experiments is only possible if participants are willing to disclose their potentially sensitive responses. Differential privacy, a widely used framework for ensuring an algorithms privacy guarantees, can encourage participants to share their responses without the risk of de-anonymization. However, many mechanisms achieve differential privacy by adding noise to the original dataset, which reduces the precision of causal effect estimation. This introduces a fundamental trade-off between privacy and variance when performing causal analyses on differentially private data. In this work, we propose a new differentially private mechanism, "Cluster-DP", which leverages a given cluster structure in the data to improve the privacy-variance trade-off. While our results apply to any clustering, we demonstrate that selecting higher-quality clusters, according to a quality metric we introduce, can decrease the variance penalty without compromising privacy guarantees. Finally, we evaluate the theoretical and empirical performance of our Cluster-DP algorithm on both real and simulated data, comparing it to common baselines, including two special cases of our algorithm: its unclustered version and a uniform-prior version.
