FedSat: A Statistical Aggregation Approach for Class Imbalanced Clients in Federated Learning
Sujit Chowdhury, Raju Halder
TL;DR
FedSat tackles three forms of data heterogeneity in federated learning—label skewness, missing classes, and quantity skewness—by integrating a prediction-sensitive loss and a prioritized-class based weighted aggregation. It formalizes a global objective and employs a two-stage evaluation of client updates via worker sets to compute class-aware statistics, guiding robust, class-aware aggregation. Empirical results on MNIST, CIFAR-10, and CIFAR-100 under LS, LSMC, and LQSMC demonstrate consistent improvements over baselines, with faster convergence and stronger performance on underrepresented classes, including up to substantial gains in extreme non-IID settings. The approach offers a scalable, robust solution for real-world heterogeneous FL and points to future work in privacy-preserving and security-enhanced extensions such as differential privacy, secure aggregation, and blockchain integration.
Abstract
Federated learning (FL) has emerged as a promising paradigm for privacy-preserving distributed machine learning, but faces challenges with heterogeneous data distributions across clients. This paper presents FedSat, a novel FL approach specifically designed to simultaneously handle three forms of data heterogeneity, namely label skewness, missing classes, and quantity skewness, by proposing a prediction-sensitive loss function and a prioritized-class based weighted aggregation scheme. While the prediction-sensitive loss function enhances model performance on minority classes, the prioritized-class based weighted aggregation scheme ensures client contributions are weighted based on both statistical significance and performance on critical classes. Extensive experiments across diverse data-heterogeneity settings demonstrate that FedSat significantly outperforms state-of-the-art baselines, with an average improvement of 1.8% over the second-best method and 19.87% over the weakest-performing baseline. The approach also demonstrates faster convergence compared to existing methods. These results highlight FedSat's effectiveness in addressing the challenges of heterogeneous federated learning and its potential for real-world applications.
