Privately Answering Queries on Skewed Data via Per Record Differential Privacy
Jeremy Seeman, William Sexton, David Pujol, Ashwin Machanavajjhala
TL;DR
This work introduces per-record zero-concentrated DP (PRzCDP), a privacy framework in which a record’s privacy loss is a function of that record’s confidential value. A public policy function P maps hypothetical records to maximum allowable privacy loss, while the actual losses depend on the confidential data, enabling stronger utility for skewed or heavy-tailed statistics. The authors propose unit splitting as a preprocessing step that converts PRzCDP demands into standard zCDP mechanisms on split data, providing a constructive and flexible way to publish private SQL-style aggregates with reduced sensitivity. Empirical results on simulated, CIS, and CBP datasets demonstrate substantial utility improvements over global zCDP for skewed data workloads, validating PRzCDP as a practical approach for data products with influential outliers, such as county-level payrolls and establishment counts. The work also outlines future directions for stronger semantic guarantees and extensions to interactive query settings.
Abstract
We consider the problem of the private release of statistics (like aggregate payrolls) where it is critical to preserve the contribution made by a small number of outlying large entities. We propose a privacy formalism, per-record zero concentrated differential privacy (PzCDP), where the privacy loss associated with each record is a public function of that record's value. Unlike other formalisms which provide different privacy losses to different records, PRzCDP's privacy loss depends explicitly on the confidential data. We define our formalism, derive its properties, and propose mechanisms which satisfy PRzCDP that are uniquely suited to publishing skewed or heavy-tailed statistics, where a small number of records contribute substantially to query answers. This targeted relaxation helps overcome the difficulties of applying standard DP to these data products.
