The Fair Value of Data Under Heterogeneous Privacy Constraints in Federated Learning
Justin Kang, Ramtin Pedarsani, Kannan Ramchandran
TL;DR
This work addresses fair valuation and incentive design for data contributed under heterogeneous privacy constraints in federated learning. It introduces two axiomatic fairness notions—one coequal for the platform and users, and one among users only—rooted in Shapley-value-like decompositions, extended to privacy-aware utilities $U(\boldsymbol{\rho})$. Through a heterogeneous privacy framework and mean-estimation examples, the paper reveals three regimes of platform behavior as privacy sensitivity varies, and provides mechanism-design algorithms to compute Nash equilibria under fair payments. The results illuminate how data quantity, privacy level, and heterogeneity jointly determine fair payments and platform strategies, offering a principled baseline for privacy-aware data markets and FL incentive design. The practical impact lies in guiding regulators and platforms toward transparent, fair, and efficient data acquisition policies under realistic privacy constraints.
Abstract
Modern data aggregation often involves a platform collecting data from a network of users with various privacy options. Platforms must solve the problem of how to allocate incentives to users to convince them to share their data. This paper puts forth an idea for a \textit{fair} amount to compensate users for their data at a given privacy level based on an axiomatic definition of fairness, along the lines of the celebrated Shapley value. To the best of our knowledge, these are the first fairness concepts for data that explicitly consider privacy constraints. We also formulate a heterogeneous federated learning problem for the platform with privacy level options for users. By studying this problem, we investigate the amount of compensation users receive under fair allocations with different privacy levels, amounts of data, and degrees of heterogeneity. We also discuss what happens when the platform is forced to design fair incentives. Under certain conditions we find that when privacy sensitivity is low, the platform will set incentives to ensure that it collects all the data with the lowest privacy options. When the privacy sensitivity is above a given threshold, the platform will provide no incentives to users. Between these two extremes, the platform will set the incentives so some fraction of the users chooses the higher privacy option and the others chooses the lower privacy option.
