The Key of Parameter Skew in Federated Learning
Junfeng Liao, Sifan Wang, Ye Yuan, Riquan Zhang
TL;DR
Federated Learning under non-IID data suffers from parameter skew in local models, which biases the global parameter estimation. The authors propose FedPake, a dispersion-aware aggregation that uses the coefficient of variation to separate high-dispersion from low-dispersion parameters and introduces Micro-Class and Macro-Class to weight high-dispersion updates during global aggregation. The global objective is $ \text{Loss} = \frac{\sum_{i=1}^N |D^i| \\cdot \mathbb{E}_{(X^i, Y^i) \sim D^i}[\mathcal{L}(f(X^i), Y^i)]}{\sum_{i=1}^N |D^i|} $. Empirical results on CIFAR-10/100 and Tiny-ImageNet show FedPake outperforms eight baselines by up to about 4.7 percentage points in test accuracy and converges faster with modest additional computation. This work provides a principled approach to mitigating parameter skew, improving generalization in heterogeneous FL settings and guiding future discrepancy-aware aggregation, with potential applicability to larger models.
Abstract
Federated Learning (FL) has emerged as an excellent solution for performing deep learning on different data owners without exchanging raw data. However, statistical heterogeneity in FL presents a key challenge, leading to a phenomenon of skewness in local model parameter distributions that researchers have largely overlooked. In this work, we propose the concept of parameter skew to describe the phenomenon that can substantially affect the accuracy of global model parameter estimation. Additionally, we introduce FedSA, an aggregation strategy to obtain a high-quality global model, to address the implication from parameter skew. Specifically, we categorize parameters into high-dispersion and low-dispersion groups based on the coefficient of variation. For high-dispersion parameters, Micro-Classes (MIC) and Macro-Classes (MAC) represent the dispersion at the micro and macro levels, respectively, forming the foundation of FedSA. To evaluate the effectiveness of FedSA, we conduct extensive experiments with different FL algorithms on three computer vision datasets. FedSA outperforms eight state-of-the-art baselines by about 4.7% in test accuracy.
