On the Volatility of Shapley-Based Contribution Metrics in Federated Learning
Arno Geimer, Beltran Fiz, Radu State
TL;DR
This work examines whether Shapley-value-based contribution metrics in Federated Learning yield stable and fair rewards across common aggregation strategies. Using gradient-based One-Round Reconstruction on four image datasets with Dirichlet non-IID splits, it compares round-based contributions under eight aggregation methods to a size-based ground truth. Results show that while aggregate contributions often align with ground truth, per-client rewards are highly sensitive to the chosen aggregation strategy, exposing serious stability and fairness concerns. The authors discuss implications for trust in FL ecosystems and propose directions such as ensemble aggregation and scalable Shapley approximations for future work to enhance robustness and practicality.
Abstract
Federated learning (FL) is a collaborative and privacy-preserving Machine Learning paradigm, allowing the development of robust models without the need to centralize sensitive data. A critical challenge in FL lies in fairly and accurately allocating contributions from diverse participants. Inaccurate allocation can undermine trust, lead to unfair compensation, and thus participants may lack the incentive to join or actively contribute to the federation. Various remuneration strategies have been proposed to date, including auction-based approaches and Shapley-value-based methods, the latter offering a means to quantify the contribution of each participant. However, little to no work has studied the stability of these contribution evaluation methods. In this paper, we evaluate participant contributions in federated learning using gradient-based model reconstruction techniques with Shapley values and compare the round-based contributions to a classic data contribution measurement scheme. We provide an extensive analysis of the discrepancies of Shapley values across a set of aggregation strategies and examine them on an overall and a per-client level. We show that, between different aggregation techniques, Shapley values lead to unstable reward allocations among participants. Our analysis spans various data heterogeneity distributions, including independent and identically distributed (IID) and non-IID scenarios.
