Slowly Scaling Per-Record Differential Privacy
Brian Finley, Anthony M Caruso, Justin C Doty, Ashwin Machanavajjhala, Mikaela R Meyer, David Pujol, William Sexton, Zachary Terner
TL;DR
This work introduces slowly scaling per-record zero-concentrated differential privacy (PRzCDP) mechanisms to protect statistics derived from data with heavy tails. It presents two mechanism families—transformation mechanisms (concave mappings with Gaussian noise) and additive mechanisms (fat-tailed noise)—that ensure privacy loss scales sublinearly with a record's influence, mitigating extreme losses from outliers. The paper provides formal PRzCDP guarantees, unbiased estimators for transformed queries, and detailed empirical evaluation on CBP-like and cattle datasets, showing improved privacy for large-influence records while maintaining utility. These mechanisms enable more nuanced privacy-utility tradeoffs for large establishments and other high-impact records in economic data releases.
Abstract
We develop formal privacy mechanisms for releasing statistics from data with many outlying values, such as income data. These mechanisms ensure that a per-record differential privacy guarantee degrades slowly in the protected records' influence on the statistics being released. Formal privacy mechanisms generally add randomness, or "noise," to published statistics. If a noisy statistic's distribution changes little with the addition or deletion of a single record in the underlying dataset, an attacker looking at this statistic will find it plausible that any particular record was present or absent, preserving the records' privacy. More influential records -- those whose addition or deletion would change the statistics' distribution more -- typically suffer greater privacy loss. The per-record differential privacy framework quantifies these record-specific privacy guarantees, but existing mechanisms let these guarantees degrade rapidly (linearly or quadratically) with influence. While this may be acceptable in cases with some moderately influential records, it results in unacceptably high privacy losses when records' influence varies widely, as is common in economic data. We develop mechanisms with privacy guarantees that instead degrade as slowly as logarithmically with influence. These mechanisms allow for the accurate, unbiased release of statistics, while providing meaningful protection for highly influential records. As an example, we consider the private release of sums of unbounded establishment data such as payroll, where our mechanisms extend meaningful privacy protection even to very large establishments. We evaluate these mechanisms empirically and demonstrate their utility.
