Local Data Quantity-Aware Weighted Averaging for Federated Learning with Dishonest Clients
Leming Wu, Yaochu Jin, Kuangrong Hao, Han Yu
TL;DR
This work addresses the vulnerability of server-side weighted aggregation in Federated Learning to dishonest client data-volume reporting by introducing FedDua, which adds a client-side quantity-aware branch to predict an adjustment factor $\alpha$ from local updates $\Delta\theta$, learning rate $\eta$, and the expected gradient $\mathbb{E}[\nabla L(\theta_i)]$. The server verifies reported data volumes by comparing $\alpha$ against a pre-trained distribution, flagging or excluding dishonest clients, and allowing aggregation to proceed based on predicted data contributions when necessary. The approach is encapsulated by the relations $\alpha = f_{dua}(\varphi; embedding(\text{client}_i))$ and $\text{Loss}_{dua} = \frac{1}{2}\left\| \frac{\Delta \theta_i}{\eta \mathbb{E}[\nabla L(\theta_i)] \alpha} - |D_i| \right\|^2$, with $R \approx \frac{\Delta \theta_i}{\eta \mathbb{E}[\nabla L(\theta_i)] \alpha}$ linking updates to data volume. Empirical results on CIFAR-10 and MedMNIST show FedDua yields an average improvement of $3.17\%$ over four popular FL aggregators in the presence of inaccurate data declarations, while incurring only modest client-side overhead and no extra communication. The method is modular and can be integrated into existing FL algorithms to enhance robustness against data-volume manipulation, with future work extending to data quality considerations.
Abstract
Federated learning (FL) enables collaborative training of deep learning models without requiring data to leave local clients, thereby preserving client privacy. The aggregation process on the server plays a critical role in the performance of the resulting FL model. The most commonly used aggregation method is weighted averaging based on the amount of data from each client, which is thought to reflect each client's contribution. However, this method is prone to model bias, as dishonest clients might report inaccurate training data volumes to the server, which is hard to verify. To address this issue, we propose a novel secure \underline{Fed}erated \underline{D}ata q\underline{u}antity-\underline{a}ware weighted averaging method (FedDua). It enables FL servers to accurately predict the amount of training data from each client based on their local model gradients uploaded. Furthermore, it can be seamlessly integrated into any FL algorithms that involve server-side model aggregation. Extensive experiments on three benchmarking datasets demonstrate that FedDua improves the global model performance by an average of 3.17% compared to four popular FL aggregation methods in the presence of inaccurate client data volume declarations.
