A proposal to increase data utility on Global Differential Privacy data based on data use predictions
Henry C. Nunes, Marlon P. da Silva, Charles V. Neu, Avelino F. Zorzo
TL;DR
This work tackles increasing data utility in Global Differential Privacy for summary statistics by predicting how analysts will reuse released statistics and allocating the privacy budget $\epsilon$ to privilege those queries. A formal metric is proposed to compare budget allocations by combining the utility loss of individual statistics and of arithmetic expressions (equations) built from them, via $Metric(Tup, Eqs) = \sum_{i=1}^{nsta} us(Tup[i]) + \sum_{i=1}^{neq} ue(Eqs[i])$. The paper describes a DP scenario with a Developer, Curator, and Analyst, where the allocation of $\epsilon$ influences the total noise and downstream utility. Future work includes defining concrete utility measures $us$ and $ue$, evaluating simple operations, and exploring automatic optimization (e.g., gradient descent or closed-form solutions) to find the optimal budget split.
Abstract
This paper presents ongoing research focused on improving the utility of data protected by Global Differential Privacy(DP) in the scenario of summary statistics. Our approach is based on predictions on how an analyst will use statistics released under DP protection, so that a developer can optimise data utility on further usage of the data in the privacy budget allocation. This novel approach can potentially improve the utility of data without compromising privacy constraints. We also propose a metric that can be used by the developer to optimise the budget allocation process.
