Table of Contents
Fetching ...

Wasserstein Markets for Differentially-Private Data

Saurab Chhachhi, Fei Teng

TL;DR

This paper establishes a Wasserstein-distance based framework to value aggregated differentially-private data, linking distributional differences to task performance through Lipschitz guarantees and endogenous DP modelling. It provides three procurement mechanisms—exogenous budget, endogenous budget, and joint optimization—reformulated into tractable MISOCPs via Myerson's lemma, enabling budget-feasible, incentive-compatible data purchasing without sharing raw data. A Hoeffding-bound approximation decouples valuation from task complexity, and private computation of the WD is discussed to preserve data owners' privacy. Numerical experiments on synthetic parameter estimation validate WD as a robust, task-agnostic valuation metric and demonstrate how DP, non-IID data, and budget structures shape procurement choices and Shapley allocations. The work offers a principled, scalable approach for privacy-preserving data markets with practical implications for data owners and buyers in real-world decision making.

Abstract

Data is an increasingly vital component of decision making processes across industries. However, data access raises privacy concerns motivating the need for privacy-preserving techniques such as differential privacy. Data markets provide a means to enable wider access as well as determine the appropriate privacy-utility trade-off. Existing data market frameworks either require a trusted third party to perform computationally expensive valuations or are unable to capture the combinatorial nature of data value and do not endogenously model the effect of differential privacy. This paper addresses these shortcomings by proposing a valuation mechanism based on the Wasserstein distance for differentially-private data, and corresponding procurement mechanisms by leveraging incentive mechanism design theory, for task-agnostic data procurement, and task-specific procurement co-optimisation. The mechanisms are reformulated into tractable mixed-integer second-order cone programs, which are validated with numerical studies.

Wasserstein Markets for Differentially-Private Data

TL;DR

This paper establishes a Wasserstein-distance based framework to value aggregated differentially-private data, linking distributional differences to task performance through Lipschitz guarantees and endogenous DP modelling. It provides three procurement mechanisms—exogenous budget, endogenous budget, and joint optimization—reformulated into tractable MISOCPs via Myerson's lemma, enabling budget-feasible, incentive-compatible data purchasing without sharing raw data. A Hoeffding-bound approximation decouples valuation from task complexity, and private computation of the WD is discussed to preserve data owners' privacy. Numerical experiments on synthetic parameter estimation validate WD as a robust, task-agnostic valuation metric and demonstrate how DP, non-IID data, and budget structures shape procurement choices and Shapley allocations. The work offers a principled, scalable approach for privacy-preserving data markets with practical implications for data owners and buyers in real-world decision making.

Abstract

Data is an increasingly vital component of decision making processes across industries. However, data access raises privacy concerns motivating the need for privacy-preserving techniques such as differential privacy. Data markets provide a means to enable wider access as well as determine the appropriate privacy-utility trade-off. Existing data market frameworks either require a trusted third party to perform computationally expensive valuations or are unable to capture the combinatorial nature of data value and do not endogenously model the effect of differential privacy. This paper addresses these shortcomings by proposing a valuation mechanism based on the Wasserstein distance for differentially-private data, and corresponding procurement mechanisms by leveraging incentive mechanism design theory, for task-agnostic data procurement, and task-specific procurement co-optimisation. The mechanisms are reformulated into tractable mixed-integer second-order cone programs, which are validated with numerical studies.

Paper Structure

This paper contains 42 sections, 2 theorems, 23 equations, 15 figures, 2 tables.

Key Result

Theorem 1

Given a $K_{\mathcal{M}}$-Lipschitz loss function, $l(x_i)$, for a task $\mathcal{M}$, the difference in the expected loss obtained using $X_P$ or $X_T$ is bounded by the WD between them Ghorbani2020:

Figures (15)

  • Figure 1: Overview of Proposed Valuation Framework
  • Figure 2: Dataflow for Proposed Data Procurement Mechanisms
  • Figure 3: Performance of Lipschitz Bounds for Different Data Distributions. Top: Mean of Metrics as Function of Coalition Size. Bottom: Scatterplot of all Coalitions against WD.
  • Figure 4: Correlations between Distances and Loss Functions.
  • Figure 5: Shapley Allocations for Gaussian Data.
  • ...and 10 more figures

Theorems & Definitions (4)

  • Theorem 1: Lipschitz Bound
  • Theorem 2: Hoeffding Bound
  • Definition 3: Data Procurement Mechanism
  • Definition 4