FedDUAL: A Dual-Strategy with Adaptive Loss and Dynamic Aggregation for Mitigating Data Heterogeneity in Federated Learning
Pranab Sahoo, Ashutosh Tripathi, Sriparna Saha, Samrat Mondal
TL;DR
FedDUAL addresses data heterogeneity in federated learning by pairing an adaptive, KL-regularized client-side loss with a distribution-aware server-side aggregation using a Wasserstein barycenter for the last layers. The adaptive loss balances local optimization with global coherence via a dynamically tuned parameter $eta$, while the server aggregates final-layer updates through a Sinkhorn-approximated Wasserstein barycenter to align diverse client learning behaviors. The approach yields faster convergence, higher accuracy, and flatter loss landscapes across CIFAR-10/100 and FMNIST under severe non-IID conditions, with theoretical convergence guarantees. Practically, FedDUAL offers robust generalization in heterogeneous real-world settings, at the cost of additional server-side computation that remains tractable when applied selectively to the final layers. Overall, the dual strategy demonstrates a meaningful advance in scalable, privacy-preserving learning under label skew and other non-IID data challenges.
Abstract
Federated Learning (FL) marks a transformative approach to distributed model training by combining locally optimized models from various clients into a unified global model. While FL preserves data privacy by eliminating centralized storage, it encounters significant challenges such as performance degradation, slower convergence, and reduced robustness of the global model due to the heterogeneity in client data distributions. Among the various forms of data heterogeneity, label skew emerges as a particularly formidable and prevalent issue, especially in domains such as image classification. To address these challenges, we begin with comprehensive experiments to pinpoint the underlying issues in the FL training process. Based on our findings, we then introduce an innovative dual-strategy approach designed to effectively resolve these issues. First, we introduce an adaptive loss function for client-side training, meticulously crafted to preserve previously acquired knowledge while maintaining an optimal equilibrium between local optimization and global model coherence. Secondly, we develop a dynamic aggregation strategy for aggregating client models at the server. This approach adapts to each client's unique learning patterns, effectively addressing the challenges of diverse data across the network. Our comprehensive evaluation, conducted across three diverse real-world datasets, coupled with theoretical convergence guarantees, demonstrates the superior efficacy of our method compared to several established state-of-the-art approaches.
