Table of Contents
Fetching ...

Equilibria of Data Marketplaces with Privacy-Aware Sellers under Endogenous Privacy Costs

Diptangshu Sen, Jingyan Wang, Juba Ziani

TL;DR

This paper studies a two-sided data marketplace where user privacy costs are endogenous and depend on downstream buyers' data purchases. It develops a tractable model in which a platform posts a price, buyers decide data mass under budgets, and users decide participation with utility $U_v = Q(\\alpha) - E[ v \, \#buyers({v}) ]$, with costs $c(v,n_v)=v n_v$ and $v$ drawn from a distribution. A key contribution is showing that market equilibria reduce to a single scalar equation in the participation rate $\\alpha$, enabling efficient computation via grid search, and enabling closed-form characterization in simple settings under uniform privacy valuations for both constant and linear participation benefits. The paper compares endogenous versus exogenous privacy costs, highlighting substantial qualitative differences in equilibrium structure, and extends the analysis with simulations and semi-synthetic experiments that relax distributional assumptions and permit non-linear benefit functions, revealing rich multiplicity and threshold phenomena. Together, these results deepen understanding of how platform price decisions influence participation and privacy costs, with implications for design and regulation of data marketplaces in practice.

Abstract

We study a two-sided online data ecosystem comprised of an online platform, users on the platform, and downstream learners or data buyers. The learners can buy user data on the platform (to run a statistic or machine learning task). Potential users decide whether to join by looking at the trade-off between i) their benefit from joining the platform and interacting with other users and ii) the privacy costs they incur from sharing their data. First, we introduce a novel modeling element for two-sided data platforms: the privacy costs of the users are endogenous and depend on how much of their data is purchased by the downstream learners. Then, we characterize marketplace equilibria in certain simple settings. In particular, we provide a full characterization in two variants of our model that correspond to different utility functions for the users: i) when each user gets a constant benefit for participating in the platform and ii) when each user's benefit is linearly increasing in the number of other users that participate. In both variants, equilibria in our setting are significantly different from equilibria when privacy costs are exogenous and fixed, highlighting the importance of taking endogeneity in the privacy costs into account. Finally, we provide simulations and semi-synthetic experiments to extend our results to more general assumptions. We experiment with different distributions of users' privacy costs and different functional forms of the users' utilities for joining the platform.

Equilibria of Data Marketplaces with Privacy-Aware Sellers under Endogenous Privacy Costs

TL;DR

This paper studies a two-sided data marketplace where user privacy costs are endogenous and depend on downstream buyers' data purchases. It develops a tractable model in which a platform posts a price, buyers decide data mass under budgets, and users decide participation with utility , with costs and drawn from a distribution. A key contribution is showing that market equilibria reduce to a single scalar equation in the participation rate , enabling efficient computation via grid search, and enabling closed-form characterization in simple settings under uniform privacy valuations for both constant and linear participation benefits. The paper compares endogenous versus exogenous privacy costs, highlighting substantial qualitative differences in equilibrium structure, and extends the analysis with simulations and semi-synthetic experiments that relax distributional assumptions and permit non-linear benefit functions, revealing rich multiplicity and threshold phenomena. Together, these results deepen understanding of how platform price decisions influence participation and privacy costs, with implications for design and regulation of data marketplaces in practice.

Abstract

We study a two-sided online data ecosystem comprised of an online platform, users on the platform, and downstream learners or data buyers. The learners can buy user data on the platform (to run a statistic or machine learning task). Potential users decide whether to join by looking at the trade-off between i) their benefit from joining the platform and interacting with other users and ii) the privacy costs they incur from sharing their data. First, we introduce a novel modeling element for two-sided data platforms: the privacy costs of the users are endogenous and depend on how much of their data is purchased by the downstream learners. Then, we characterize marketplace equilibria in certain simple settings. In particular, we provide a full characterization in two variants of our model that correspond to different utility functions for the users: i) when each user gets a constant benefit for participating in the platform and ii) when each user's benefit is linearly increasing in the number of other users that participate. In both variants, equilibria in our setting are significantly different from equilibria when privacy costs are exogenous and fixed, highlighting the importance of taking endogeneity in the privacy costs into account. Finally, we provide simulations and semi-synthetic experiments to extend our results to more general assumptions. We experiment with different distributions of users' privacy costs and different functional forms of the users' utilities for joining the platform.
Paper Structure (55 sections, 16 theorems, 160 equations, 14 figures)

This paper contains 55 sections, 16 theorems, 160 equations, 14 figures.

Key Result

Theorem 4.1

Suppose, $\mathcal{Q} \geq K$. Then for all $P > 0$, there exists a unique equilibrium given by $\alpha = 1$.

Figures (14)

  • Figure 2: The user participation rate $\alpha$ (y-axis) as a function of the platform price $P$ (x-axis), in the ow benefit case where $\mathcal{Q} B_{K} < B_{\leq K}$. Observe the presence of multiple equilibria at $\bar{P} = \gamma(K) = \frac{Q B_K}{B_{\le K}}$.
  • Figure 3: Market equilibria in the constant benefit case when user privacy costs are exogenous and uniformly distributed in $[0, V]$.
  • Figure 4: This figure shows the evolution of user participation rate $\alpha$ on the y-axis as a function of the platform price $P$ on the x-axis. This is for the low benefit case when $CN < K$ and $\hat{k}$ exists. The red line represents partial participation equilibria while the blue line represents full participation equilibria. It is clear that there are multiple equilibria at a single price $P$.
  • Figure 5: This figure shows the evolution of user participation rate $\alpha$ on the y-axis as a function of the platform price $P$ on the x-axis. This is for the special case when $CN = K$. The shaded red region represents partial participation equilibria while the blue line represents full participation equilibria. It is clear that there can be infinitely many equilibria at a single price $P$.
  • Figure 6: Market equilibria in the linear benefit case when user privacy costs are entirely exogenous and uniformly distributed in $[0, V]$.
  • ...and 9 more figures

Theorems & Definitions (60)

  • Remark 2.1: Range of $v$
  • Definition 3.1
  • Claim 3.1
  • proof
  • Definition 4.1: Cumulative Budget Notation
  • Theorem 4.1: High Benefit Case
  • Theorem 4.2: Moderate Benefit Case
  • Theorem 4.4: High Benefit Case
  • Theorem 4.5: Low Benefit Case
  • Theorem 4.6: Special Case
  • ...and 50 more