Table of Contents
Fetching ...

Data Exchange Markets via Utility Balancing

Aditya Bhaskara, Sreenivas Gollapudi, Sungjin Im, Kostas Kollias, Kamesh Munagala, Govind S. Sankar

TL;DR

The paper designs a data-exchange market without monetary transfers that balances interim utility across heterogeneous datasets and ML tasks via a central clearinghouse. It develops a formal Data Exchange Problem with sharing rules (notably Shapley value and proportional sharing) and proves NP-hardness of welfare maximization, then provides polynomial-time approximation algorithms using a multiplicative weights framework, including results for submodular and concave utilities. The work further shows core stability results, including existence of ε-approximate cores and efficient 2-stable Greedy constructions, and analyzes strategic behavior with respect to welfare; it also extends to imbalanced exchanges. Empirical validation on road-traffic mean-estimation tasks demonstrates substantial welfare gains over pairwise trading baselines, illustrating practical impact for collaborative data sharing in heterogeneous ML settings.

Abstract

This paper explores the design of a balanced data-sharing marketplace for entities with heterogeneous datasets and machine learning models that they seek to refine using data from other agents. The goal of the marketplace is to encourage participation for data sharing in the presence of such heterogeneity. Our market design approach for data sharing focuses on interim utility balance, where participants contribute and receive equitable utility from refinement of their models. We present such a market model for which we study computational complexity, solution existence, and approximation algorithms for welfare maximization and core stability. We finally support our theoretical insights with simulations on a mean estimation task inspired by road traffic delay estimation.

Data Exchange Markets via Utility Balancing

TL;DR

The paper designs a data-exchange market without monetary transfers that balances interim utility across heterogeneous datasets and ML tasks via a central clearinghouse. It develops a formal Data Exchange Problem with sharing rules (notably Shapley value and proportional sharing) and proves NP-hardness of welfare maximization, then provides polynomial-time approximation algorithms using a multiplicative weights framework, including results for submodular and concave utilities. The work further shows core stability results, including existence of ε-approximate cores and efficient 2-stable Greedy constructions, and analyzes strategic behavior with respect to welfare; it also extends to imbalanced exchanges. Empirical validation on road-traffic mean-estimation tasks demonstrates substantial welfare gains over pairwise trading baselines, illustrating practical impact for collaborative data sharing in heterogeneous ML settings.

Abstract

This paper explores the design of a balanced data-sharing marketplace for entities with heterogeneous datasets and machine learning models that they seek to refine using data from other agents. The goal of the marketplace is to encourage participation for data sharing in the presence of such heterogeneity. Our market design approach for data sharing focuses on interim utility balance, where participants contribute and receive equitable utility from refinement of their models. We present such a market model for which we study computational complexity, solution existence, and approximation algorithms for welfare maximization and core stability. We finally support our theoretical insights with simulations on a mean estimation task inspired by road traffic delay estimation.
Paper Structure (46 sections, 19 theorems, 45 equations, 3 figures, 1 algorithm)

This paper contains 46 sections, 19 theorems, 45 equations, 3 figures, 1 algorithm.

Key Result

Theorem 5

The welfare maximization objective in Data Exchange is NP-Hard for submodular utilities and Shapley value sharing.

Figures (3)

  • Figure 1: Construction for \ref{['thm:hard']}. The X3C instance has elements labelled. Blue boxes correspond to $Q_i$s and red boxes correspond to $P_i$s.
  • Figure 2: Instance for the proof of \ref{['thm:core-neg']}
  • Figure 3: (a) Box plots of the total utility of the algorithm and benchmark (matching) solutions, measured as a fraction of the baseline. (b,c) Total utility of the algorithm and matching benchmark with varying levels of correlation, again measured as a fraction of the baseline. Figure (b) is Random correlation, and (c) is Local correlation.

Theorems & Definitions (37)

  • Example 1
  • Example 2
  • Example 3
  • Example 4
  • Theorem 5: Proved in \ref{['sec:hardness']}
  • Theorem 6: Proved in \ref{['sec:approx']}
  • Theorem 7: Proved in \ref{['sec:continuous']}
  • Definition 8
  • proof : Proof of \ref{['thm:hard']}
  • Lemma 9
  • ...and 27 more