Table of Contents
Fetching ...

On the Theoretical Foundations of Data Exchange Economies

Hannaneh Akrami, Bhaskar Ray Chaudhury, Jugal Garg, Aniket Murhekar

TL;DR

This work develops a theoretical foundation for data exchange economies in which data is a replicable asset, introducing fairness via utility-sharing (e.g., Shapley shares) and core-stability against coalitions. It proves the existence of exchanges that are both fair and core-stable for all monotone continuous utilities and sharing rules satisfying monotonicity, normalization, and efficiency, using a fixed-point construction on a convex-like domain $Z$ and a reciprocity-enforcing map $f$, with the fixed point mapping to a reciprocal, fair exchange. For computability, the authors embed the domain to obtain PPAD-membership and design a local-search algorithm that achieves $\varepsilon$-core-stable and $\varepsilon$-reciprocal exchanges under cross-monotone shares and $L$-Lipschitz utilities, placing the problem in CLS (PPAD $\cap$ PLS). They discuss perturbations to ensure non-satiation, present a sequence of algorithmic steps (decreasing/increasing data flows) to balance surpluses, and outline open questions for extending to supermodular utilities and decentralized data exchange. Overall, the paper offers a principled framework and computational pathways for fair, stable data exchanges, highlighting foundational directions in data economics and inviting further exploration of dynamics, decentralization, and complexity boundaries.

Abstract

The immense success of ML systems relies heavily on large-scale, high-quality data. The high demand for data has led to many paradigms that involve selling, exchanging, and sharing data, motivating the study of economic processes with data as an asset. However, data differs from classical economic assets in terms of free duplication: there is no concept of limited supply since it can be replicated at zero marginal cost. This distinction introduces fundamental differences between economic processes involving data and those concerning other assets. We study a parallel to exchange (Arrow-Debreu) markets where data is the asset. Here, agents with datasets exchange data fairly and voluntarily, aiming for mutual benefit without monetary compensation. This framework is particularly relevant for non-profit organizations that seek to improve their ML models through data exchange, yet are restricted from selling their data for profit. We propose a general framework for data exchange, built on two core principles: (i) fairness, ensuring that each agent receives utility proportional to their contribution to others; contributions are quantifiable using standard credit-sharing functions like the Shapley value, and (ii) stability, ensuring that no coalition of agents can identify an exchange among themselves which they unanimously prefer to the current exchange. We show that fair and stable exchanges exist for all monotone continuous utility functions. Next, we investigate the computational complexity of finding approximate fair and stable exchanges. We present a local search algorithm for instances with monotone submodular utility functions, where each agent contributions are measured using the Shapley value. We prove that this problem lies in CLS under mild assumptions. Our framework opens up several intriguing theoretical directions for research in data economics.

On the Theoretical Foundations of Data Exchange Economies

TL;DR

This work develops a theoretical foundation for data exchange economies in which data is a replicable asset, introducing fairness via utility-sharing (e.g., Shapley shares) and core-stability against coalitions. It proves the existence of exchanges that are both fair and core-stable for all monotone continuous utilities and sharing rules satisfying monotonicity, normalization, and efficiency, using a fixed-point construction on a convex-like domain and a reciprocity-enforcing map , with the fixed point mapping to a reciprocal, fair exchange. For computability, the authors embed the domain to obtain PPAD-membership and design a local-search algorithm that achieves -core-stable and -reciprocal exchanges under cross-monotone shares and -Lipschitz utilities, placing the problem in CLS (PPAD PLS). They discuss perturbations to ensure non-satiation, present a sequence of algorithmic steps (decreasing/increasing data flows) to balance surpluses, and outline open questions for extending to supermodular utilities and decentralized data exchange. Overall, the paper offers a principled framework and computational pathways for fair, stable data exchanges, highlighting foundational directions in data economics and inviting further exploration of dynamics, decentralization, and complexity boundaries.

Abstract

The immense success of ML systems relies heavily on large-scale, high-quality data. The high demand for data has led to many paradigms that involve selling, exchanging, and sharing data, motivating the study of economic processes with data as an asset. However, data differs from classical economic assets in terms of free duplication: there is no concept of limited supply since it can be replicated at zero marginal cost. This distinction introduces fundamental differences between economic processes involving data and those concerning other assets. We study a parallel to exchange (Arrow-Debreu) markets where data is the asset. Here, agents with datasets exchange data fairly and voluntarily, aiming for mutual benefit without monetary compensation. This framework is particularly relevant for non-profit organizations that seek to improve their ML models through data exchange, yet are restricted from selling their data for profit. We propose a general framework for data exchange, built on two core principles: (i) fairness, ensuring that each agent receives utility proportional to their contribution to others; contributions are quantifiable using standard credit-sharing functions like the Shapley value, and (ii) stability, ensuring that no coalition of agents can identify an exchange among themselves which they unanimously prefer to the current exchange. We show that fair and stable exchanges exist for all monotone continuous utility functions. Next, we investigate the computational complexity of finding approximate fair and stable exchanges. We present a local search algorithm for instances with monotone submodular utility functions, where each agent contributions are measured using the Shapley value. We prove that this problem lies in CLS under mild assumptions. Our framework opens up several intriguing theoretical directions for research in data economics.

Paper Structure

This paper contains 47 sections, 33 theorems, 29 equations, 6 figures, 4 algorithms.

Key Result

Theorem 1

A reciprocal and core-stable exchange exists for all monotone continuous utility functions and any credit-sharing functions satisfying monotonicity, normalization, and efficiency.

Figures (6)

  • Figure 1: Illustration of the issue of non-separability. The squared nodes correspond to the agents and an edge $\overrightarrow{(i,j)}$ between agents $i$ and $j$ imply that $x_{ij} > 0$, i.e., $i$ shares some of her data with $j$. Given a reciprocal exchange $\mathbf{x}$, say we have a coalition $S$ ($S = \{i_1, i_2\}$ in our example) such that all agents in $S$ prefer $\mathbf{y}$ to $\mathbf{x}$. Performing any local update to the exchange in $\mathbf{x}$ (say increasing data-flow between $i_1$ and $i_2$) can affect the utility-contributions ($\psi_{ij}(\cdot)$s) of all agents that send data flow to agents in $S$ (agent $i_4$).
  • Figure 2: Illustration of how an acyclic exchange graph will ensure core-stability.
  • Figure 3: Illustration of our fixed point proof. Since $Z^{+}$ is the set of agents with non-negative surplus and $Z^{-}$ is the set of agents with negative surplus, there exists a $i \in Z^{+}$ and a $j \in Z^{-}$ such that $z_{ij} > 0$ (meaning $x_{ij} > 0$). If reducing $z_{ij}$ is not feasible, then there exists a path $j \rightarrow a \rightarrow b \rightarrow c \rightarrow d \rightarrow i$ in $G(\mathbf{x}, \alpha(\varepsilon))$. Clearly $\Delta_i(b) < \Delta_i(c)$ and $z_{bc} < \log M$. So increasing $z_{bc}$ is feasible, implying that if $Z^{-} \neq \emptyset$, then there exists agents $i$ and $j$ such that $\Delta_i(z) > \Delta_j(z)$ and either decreasing $z_{ij}$ is feasible or increasing $z_{ji}$ is feasible.
  • Figure 4: Illustration of the main bottleneck when we decrease data flow from $S$ to $\mathcal{N}\setminus S$. The top shows the surplus profiles before we reduced data flow from $i \in S$ to $j \in \mathcal{N} \setminus S$. The bottom shows the surplus levels after the decrease. The red-dashed lines in the histogram shows the surplus level prior to the change. The surplus of all agents in $\mathcal{N} \setminus S$ cannot decrease (in fact surpluses of agents $j,k$ and $\ell$ strictly increase) and the surplus of agent $i$ decreases. Both the foregoing changes are ideal as it is balancing the surplus profiles. Unfortunately, the surplus of another agent $\tilde{i} \in S$ can also increase, as $\psi_{\tilde{i}j}$ can increase as we decrease data flow from $i$ to $j$.
  • Figure 5: Definition of the functions $\beta^+_{ij}(z)$ and $\beta^-_{ij}(z)$. The values $\beta^+_{ij}(z)$ and $\beta^-_{ij}(z)$ are the distances from $z$ to the boundary of the convex set $Z$ along the directions $\mathbf{e}_{ij}$ and $-\mathbf{e}_{ij}$ respectively.
  • ...and 1 more figures

Theorems & Definitions (68)

  • Theorem 1
  • Theorem 2
  • Definition 1: Data Exchange Problem
  • Definition 2: Utility sharing function
  • Proposition 1: Shapley shares
  • proof
  • Definition 3: Reciprocal exchange
  • Definition 4
  • Definition 5
  • Definition 6
  • ...and 58 more