Table of Contents
Fetching ...

Privacy-Aware Data Acquisition under Data Similarity in Regression Markets

Shashi Raj Pandey, Pierre Pinson, Petar Popovski

TL;DR

The paper addresses regression data markets where data owners' privacy preferences and feature correlations influence data value and participation. It introduces a two-stage Stackelberg framework with $\bc$-Local Differential Privacy to model a central learner purchasing features from privacy-aware agents, and develops utility models that capture privacy costs, information leakage, and contribution-based valuations. A low-complexity first-order backward-induction algorithm computes the Stackelberg equilibrium, with theoretical guarantees on feasibility, incentive compatibility, and individual rationality. Numerical results show that data similarity depresses data value and affects participation, while the mechanism can balance privacy and utility through price signaling and privacy budgeting. The work provides a principled design for privacy-aware data acquisition in distributed regression tasks, with implications for market efficiency and data-sharing incentives.

Abstract

Data markets facilitate decentralized data exchange for applications such as prediction, learning, or inference. The design of these markets is challenged by varying privacy preferences as well as data similarity among data owners. Related works have often overlooked how data similarity impacts pricing and data value through statistical information leakage. We demonstrate that data similarity and privacy preferences are integral to market design and propose a query-response protocol using local differential privacy for a two-party data acquisition mechanism. In our regression data market model, we analyze strategic interactions between privacy-aware owners and the learner as a Stackelberg game over the asked price and privacy factor. Finally, we numerically evaluate how data similarity affects market participation and traded data value.

Privacy-Aware Data Acquisition under Data Similarity in Regression Markets

TL;DR

The paper addresses regression data markets where data owners' privacy preferences and feature correlations influence data value and participation. It introduces a two-stage Stackelberg framework with -Local Differential Privacy to model a central learner purchasing features from privacy-aware agents, and develops utility models that capture privacy costs, information leakage, and contribution-based valuations. A low-complexity first-order backward-induction algorithm computes the Stackelberg equilibrium, with theoretical guarantees on feasibility, incentive compatibility, and individual rationality. Numerical results show that data similarity depresses data value and affects participation, while the mechanism can balance privacy and utility through price signaling and privacy budgeting. The work provides a principled design for privacy-aware data acquisition in distributed regression tasks, with implications for market efficiency and data-sharing incentives.

Abstract

Data markets facilitate decentralized data exchange for applications such as prediction, learning, or inference. The design of these markets is challenged by varying privacy preferences as well as data similarity among data owners. Related works have often overlooked how data similarity impacts pricing and data value through statistical information leakage. We demonstrate that data similarity and privacy preferences are integral to market design and propose a query-response protocol using local differential privacy for a two-party data acquisition mechanism. In our regression data market model, we analyze strategic interactions between privacy-aware owners and the learner as a Stackelberg game over the asked price and privacy factor. Finally, we numerically evaluate how data similarity affects market participation and traded data value.
Paper Structure (11 sections, 4 theorems, 12 equations, 8 figures, 1 algorithm)

This paper contains 11 sections, 4 theorems, 12 equations, 8 figures, 1 algorithm.

Key Result

Lemma 1

Given the incurred cost of data exchanges $c_n$ in the regression market, with a shared value of instantaneous information leakage, there exists a unique Nash equilibrium $q_n^*$ defining the probability of supporting agents joining the collaborative training in the regression market.

Figures (8)

  • Figure 1: An illustration of learner's valuation $U(\epsilon)$ for asked data privacy factor $\epsilon$.
  • Figure 2: Example scenario: heatmap represents the impact on the normalized contribution of each agent given information leakage due to data correlation $\rho_{3,4}$ between $\{a_3\} - \{a_4\}$ and the noise injection $\sigma_4$ by $\{a_4\}$.
  • Figure 3: (Left) Evolution of normalized payment to agents $\{a_2, a_3, a_4\}$ during training in the online regression market by the agent $\{a_{1}\}$. (Mid -- Right) Zoomed-in illustration depicting price variability for agents $\{a_2, a_4\}$.
  • Figure 4: Temporal evolution of the parameters over the period.
  • Figure 5: Impact of participation on the normalized loss estimates for different collaborative online learning scenarios with four agents: (i) Central Info, with only agent $\{a_{1}\}$, (ii) Partial Info, with agents $\{a_{1}, a_2, a_3\}$, and (iii) Full Info, with all agents.
  • ...and 3 more figures

Theorems & Definitions (15)

  • Definition 1
  • Definition 2: $\epsilon$-Local Differential Privacy dwork2008differential
  • Remark 1
  • Definition 3
  • Remark 2
  • Definition 4
  • Definition 5: Feasibility
  • Definition 6
  • Lemma 1
  • proof
  • ...and 5 more