A proposal to increase data utility on Global Differential Privacy data based on data use predictions

Henry C. Nunes; Marlon P. da Silva; Charles V. Neu; Avelino F. Zorzo

A proposal to increase data utility on Global Differential Privacy data based on data use predictions

Henry C. Nunes, Marlon P. da Silva, Charles V. Neu, Avelino F. Zorzo

TL;DR

This work tackles increasing data utility in Global Differential Privacy for summary statistics by predicting how analysts will reuse released statistics and allocating the privacy budget $\epsilon$ to privilege those queries. A formal metric is proposed to compare budget allocations by combining the utility loss of individual statistics and of arithmetic expressions (equations) built from them, via $Metric(Tup, Eqs) = \sum_{i=1}^{nsta} us(Tup[i]) + \sum_{i=1}^{neq} ue(Eqs[i])$. The paper describes a DP scenario with a Developer, Curator, and Analyst, where the allocation of $\epsilon$ influences the total noise and downstream utility. Future work includes defining concrete utility measures $us$ and $ue$, evaluating simple operations, and exploring automatic optimization (e.g., gradient descent or closed-form solutions) to find the optimal budget split.

Abstract

This paper presents ongoing research focused on improving the utility of data protected by Global Differential Privacy(DP) in the scenario of summary statistics. Our approach is based on predictions on how an analyst will use statistics released under DP protection, so that a developer can optimise data utility on further usage of the data in the privacy budget allocation. This novel approach can potentially improve the utility of data without compromising privacy constraints. We also propose a metric that can be used by the developer to optimise the budget allocation process.

A proposal to increase data utility on Global Differential Privacy data based on data use predictions

TL;DR

This work tackles increasing data utility in Global Differential Privacy for summary statistics by predicting how analysts will reuse released statistics and allocating the privacy budget

to privilege those queries. A formal metric is proposed to compare budget allocations by combining the utility loss of individual statistics and of arithmetic expressions (equations) built from them, via

. The paper describes a DP scenario with a Developer, Curator, and Analyst, where the allocation of

influences the total noise and downstream utility. Future work includes defining concrete utility measures

and

, evaluating simple operations, and exploring automatic optimization (e.g., gradient descent or closed-form solutions) to find the optimal budget split.

Abstract

Paper Structure (4 sections, 11 equations, 1 figure)

This paper contains 4 sections, 11 equations, 1 figure.

Introduction
Problem Statement
A Metric to support the process of budget allocation
Conclusion and Further Work

Figures (1)

Figure 1: Scenario

A proposal to increase data utility on Global Differential Privacy data based on data use predictions

TL;DR

Abstract

A proposal to increase data utility on Global Differential Privacy data based on data use predictions

Authors

TL;DR

Abstract

Table of Contents

Figures (1)