Wasserstein Markets for Differentially-Private Data
Saurab Chhachhi, Fei Teng
TL;DR
This paper establishes a Wasserstein-distance based framework to value aggregated differentially-private data, linking distributional differences to task performance through Lipschitz guarantees and endogenous DP modelling. It provides three procurement mechanisms—exogenous budget, endogenous budget, and joint optimization—reformulated into tractable MISOCPs via Myerson's lemma, enabling budget-feasible, incentive-compatible data purchasing without sharing raw data. A Hoeffding-bound approximation decouples valuation from task complexity, and private computation of the WD is discussed to preserve data owners' privacy. Numerical experiments on synthetic parameter estimation validate WD as a robust, task-agnostic valuation metric and demonstrate how DP, non-IID data, and budget structures shape procurement choices and Shapley allocations. The work offers a principled, scalable approach for privacy-preserving data markets with practical implications for data owners and buyers in real-world decision making.
Abstract
Data is an increasingly vital component of decision making processes across industries. However, data access raises privacy concerns motivating the need for privacy-preserving techniques such as differential privacy. Data markets provide a means to enable wider access as well as determine the appropriate privacy-utility trade-off. Existing data market frameworks either require a trusted third party to perform computationally expensive valuations or are unable to capture the combinatorial nature of data value and do not endogenously model the effect of differential privacy. This paper addresses these shortcomings by proposing a valuation mechanism based on the Wasserstein distance for differentially-private data, and corresponding procurement mechanisms by leveraging incentive mechanism design theory, for task-agnostic data procurement, and task-specific procurement co-optimisation. The mechanisms are reformulated into tractable mixed-integer second-order cone programs, which are validated with numerical studies.
