On Improving the Composition Privacy Loss in Differential Privacy for Fixed Estimation Error
V. Arvind Rameshwar, Anshoo Tandon
TL;DR
This work addresses privately releasing means and variances across multiple disjoint grids under user-level differential privacy with heterogeneous user contributions. It introduces exact analytical characterizations of mean/variance sensitivities and worst-case clipping biases, and proposes an iterative Clip-User procedure to suppress user contributions and reduce privacy loss under composition without increasing the worst-case estimation error. The authors also provide a post-processing technique based on pseudo-user creation to further reduce grid-wise error, and validate the approach on real Intelligent Traffic Management System data and synthetic datasets, showing notable privacy-preservation gains with controlled accuracy costs. The methodology is applicable to a broad class of grid-based private statistics and offers practical guidance for tracking privacy loss in multi-query, multi-grid settings, with potential extensions to approximate DP and other statistics.
Abstract
This paper considers the private release of statistics of disjoint subsets of a dataset, in the setting of data heterogeneity, where users could contribute more than one sample, with different users contributing potentially different numbers of samples. In particular, we focus on the $ε$-differentially private release of sample means and variances of sample values in disjoint subsets of a dataset, under the assumption that the numbers of contributions of each user in each subset is publicly known. Our main contribution is an iterative algorithm, based on suppressing user contributions, which seeks to reduce the overall privacy loss degradation under a canonical Laplace mechanism, while not increasing the worst estimation error among the subsets. Important components of this analysis are our exact, analytical characterizations of the sensitivities and the worst-case bias errors of estimators of the sample mean and variance, which are obtained by clipping or suppressing user contributions. We test the performance of our algorithm on real-world and synthetic datasets and demonstrate clear improvements in the privacy loss degradation, for fixed worst-case estimation error.
