Privacy-Aware Data Acquisition under Data Similarity in Regression Markets
Shashi Raj Pandey, Pierre Pinson, Petar Popovski
TL;DR
The paper addresses regression data markets where data owners' privacy preferences and feature correlations influence data value and participation. It introduces a two-stage Stackelberg framework with $\bc$-Local Differential Privacy to model a central learner purchasing features from privacy-aware agents, and develops utility models that capture privacy costs, information leakage, and contribution-based valuations. A low-complexity first-order backward-induction algorithm computes the Stackelberg equilibrium, with theoretical guarantees on feasibility, incentive compatibility, and individual rationality. Numerical results show that data similarity depresses data value and affects participation, while the mechanism can balance privacy and utility through price signaling and privacy budgeting. The work provides a principled design for privacy-aware data acquisition in distributed regression tasks, with implications for market efficiency and data-sharing incentives.
Abstract
Data markets facilitate decentralized data exchange for applications such as prediction, learning, or inference. The design of these markets is challenged by varying privacy preferences as well as data similarity among data owners. Related works have often overlooked how data similarity impacts pricing and data value through statistical information leakage. We demonstrate that data similarity and privacy preferences are integral to market design and propose a query-response protocol using local differential privacy for a two-party data acquisition mechanism. In our regression data market model, we analyze strategic interactions between privacy-aware owners and the learner as a Stackelberg game over the asked price and privacy factor. Finally, we numerically evaluate how data similarity affects market participation and traded data value.
