Table of Contents
Fetching ...

Incentivizing Time-Aware Fairness in Data Sharing

Jiangwei Chen, Kieu Thao Nguyen Pham, Rachael Hwee Ling Sim, Arun Verma, Zhaoxuan Wu, Chuan-Sheng Foo, Bryan Kian Hsiang Low

TL;DR

We address asynchronous data sharing in collaborative ML by introducing time-aware incentives that reward early contributions while preserving fairness. The authors formalize incentive conditions and data-valuation requirements, and propose two reward schemes that integrate joining-time information with Shapley-based concepts. Rewards can be realized exactly or approximately via likelihood tempering or subset selection, and empirical results on synthetic and real datasets demonstrate that early joiners receive higher, IR-compliant rewards and that model performance improves with collaboration. The framework balances data value and timing, offering practical mechanisms for motivating timely, high-quality data sharing under non-simultaneous participation. While effective, the approach faces computational and privacy considerations, motivating future work on scalability, privacy-preserving variants, and extensions to repeated or online data sharing.

Abstract

In collaborative data sharing and machine learning, multiple parties aggregate their data resources to train a machine learning model with better model performance. However, as the parties incur data collection costs, they are only willing to do so when guaranteed incentives, such as fairness and individual rationality. Existing frameworks assume that all parties join the collaboration simultaneously, which does not hold in many real-world scenarios. Due to the long processing time for data cleaning, difficulty in overcoming legal barriers, or unawareness, the parties may join the collaboration at different times. In this work, we propose the following perspective: As a party who joins earlier incurs higher risk and encourages the contribution from other wait-and-see parties, that party should receive a reward of higher value for sharing data earlier. To this end, we propose a fair and time-aware data sharing framework, including novel time-aware incentives. We develop new methods for deciding reward values to satisfy these incentives. We further illustrate how to generate model rewards that realize the reward values and empirically demonstrate the properties of our methods on synthetic and real-world datasets.

Incentivizing Time-Aware Fairness in Data Sharing

TL;DR

We address asynchronous data sharing in collaborative ML by introducing time-aware incentives that reward early contributions while preserving fairness. The authors formalize incentive conditions and data-valuation requirements, and propose two reward schemes that integrate joining-time information with Shapley-based concepts. Rewards can be realized exactly or approximately via likelihood tempering or subset selection, and empirical results on synthetic and real datasets demonstrate that early joiners receive higher, IR-compliant rewards and that model performance improves with collaboration. The framework balances data value and timing, offering practical mechanisms for motivating timely, high-quality data sharing under non-simultaneous participation. While effective, the approach faces computational and privacy considerations, motivating future work on scalability, privacy-preserving variants, and extensions to repeated or online data sharing.

Abstract

In collaborative data sharing and machine learning, multiple parties aggregate their data resources to train a machine learning model with better model performance. However, as the parties incur data collection costs, they are only willing to do so when guaranteed incentives, such as fairness and individual rationality. Existing frameworks assume that all parties join the collaboration simultaneously, which does not hold in many real-world scenarios. Due to the long processing time for data cleaning, difficulty in overcoming legal barriers, or unawareness, the parties may join the collaboration at different times. In this work, we propose the following perspective: As a party who joins earlier incurs higher risk and encourages the contribution from other wait-and-see parties, that party should receive a reward of higher value for sharing data earlier. To this end, we propose a fair and time-aware data sharing framework, including novel time-aware incentives. We develop new methods for deciding reward values to satisfy these incentives. We further illustrate how to generate model rewards that realize the reward values and empirically demonstrate the properties of our methods on synthetic and real-world datasets.

Paper Structure

This paper contains 37 sections, 10 theorems, 11 equations, 12 figures.

Key Result

Theorem 6.1

For each party $i \in N$, its Shapley value at time value $\tau$, ${\varphi_{i}^{(\tau)}} \triangleq \varphi_i({v_{(\cdot)}^{(\tau)}}, N_{\tau}) \text{ if } i \in N_{\tau} \text{ and } v_i \text{ otherwise}$. Let the weight of time interval $t$ be $w^{(t)} \triangleq \beta^{t} / \sum_{\tau = 0}^T \b

Figures (12)

  • Figure 1: Overview of our data sharing problem setting (App. \ref{['app:justify-setting']}) and the impact of fairness and our time-aware incentives (Sec. \ref{['sec:incentives']}).
  • Figure 2: Overview of our proposed methods. (A) We partition the collaboration period into time intervals and consider separate cooperative games for each, rewarding parties via a weighted sum of the corresponding Shapley values (Sec. \ref{['sec:weight-with-time']}). (B) We propose a new time-aware data valuation function and directly use the resulting Shapley values as the reward values (Sec. \ref{['sec:value-with-time']}).
  • Figure 3: Graphs of $r_i^*$ vs. $t_1$ with the Friedman dataset using methods in (a) Sec. \ref{['sec:weight-with-time']} (b) Sec. \ref{['sec:value-with-time']}.
  • Figure 4: Graphs of differences between reward values with the Friedman dataset.
  • Figure 5: Graphs of (a, b) reward values and (c, d) MNLP vs. joining time $t_1$ on the CaliH dataset.
  • ...and 7 more figures

Theorems & Definitions (16)

  • Theorem 6.1
  • Remark 6.2: Efficient Estimation due to Linearity
  • Theorem 6.3
  • Lemma F.1
  • proof
  • Corollary F.2
  • proof
  • Lemma F.3: Necessity
  • proof
  • Lemma F.4: Symmetry
  • ...and 6 more