Addressing Data Quality Decompensation in Federated Learning via Dynamic Client Selection
Qinjun Fei, Nuria Rodríguez-Barroso, María Victoria Luzón, Zhongliang Zhang, Francisco Herrera
TL;DR
This paper addresses data quality decompensation in cross-silo Federated Learning by introducing SBRO-FL, a unified framework that combines dynamic bidding, prospect-theory–driven reputation, and budget-aware client selection. It leverages Shapley-value–based contribution evaluation to quantify each client's marginal impact ($sv_i^t$) and updates reputations through a risk-sensitive mechanism, all within a 0-1 integer programming formulation for selection. The approach jointly optimizes data reliability, incentive compatibility, and cost efficiency, demonstrating improved accuracy, faster convergence, and robustness against adversarial bidding and noisy data across four datasets. The work has practical implications for scalable, trustworthy FL deployments by balancing data quality, economic feasibility, and participation incentives.
Abstract
In cross-silo Federated Learning (FL), client selection is critical to ensure high model performance, yet it remains challenging due to data quality decompensation, budget constraints, and incentive compatibility. As training progresses, these factors exacerbate client heterogeneity and degrade global performance. Most existing approaches treat these challenges in isolation, making jointly optimizing multiple factors difficult. To address this, we propose Shapley-Bid Reputation Optimized Federated Learning (SBRO-FL), a unified framework integrating dynamic bidding, reputation modeling, and cost-aware selection. Clients submit bids based on their perceived data quality, and their contributions are evaluated using Shapley values to quantify their marginal impact on the global model. A reputation system, inspired by prospect theory, captures historical performance while penalizing inconsistency. The client selection problem is formulated as a 0-1 integer program that maximizes reputation-weighted utility under budget constraints. Experiments on FashionMNIST, EMNIST, CIFAR-10, and SVHN datasets show that SBRO-FL improves accuracy, convergence speed, and robustness, even in adversarial and low-bid interference scenarios. Our results highlight the importance of balancing data reliability, incentive compatibility, and cost efficiency to enable scalable and trustworthy FL deployments.
