Table of Contents
Fetching ...

FedDUAL: A Dual-Strategy with Adaptive Loss and Dynamic Aggregation for Mitigating Data Heterogeneity in Federated Learning

Pranab Sahoo, Ashutosh Tripathi, Sriparna Saha, Samrat Mondal

TL;DR

FedDUAL addresses data heterogeneity in federated learning by pairing an adaptive, KL-regularized client-side loss with a distribution-aware server-side aggregation using a Wasserstein barycenter for the last layers. The adaptive loss balances local optimization with global coherence via a dynamically tuned parameter $eta$, while the server aggregates final-layer updates through a Sinkhorn-approximated Wasserstein barycenter to align diverse client learning behaviors. The approach yields faster convergence, higher accuracy, and flatter loss landscapes across CIFAR-10/100 and FMNIST under severe non-IID conditions, with theoretical convergence guarantees. Practically, FedDUAL offers robust generalization in heterogeneous real-world settings, at the cost of additional server-side computation that remains tractable when applied selectively to the final layers. Overall, the dual strategy demonstrates a meaningful advance in scalable, privacy-preserving learning under label skew and other non-IID data challenges.

Abstract

Federated Learning (FL) marks a transformative approach to distributed model training by combining locally optimized models from various clients into a unified global model. While FL preserves data privacy by eliminating centralized storage, it encounters significant challenges such as performance degradation, slower convergence, and reduced robustness of the global model due to the heterogeneity in client data distributions. Among the various forms of data heterogeneity, label skew emerges as a particularly formidable and prevalent issue, especially in domains such as image classification. To address these challenges, we begin with comprehensive experiments to pinpoint the underlying issues in the FL training process. Based on our findings, we then introduce an innovative dual-strategy approach designed to effectively resolve these issues. First, we introduce an adaptive loss function for client-side training, meticulously crafted to preserve previously acquired knowledge while maintaining an optimal equilibrium between local optimization and global model coherence. Secondly, we develop a dynamic aggregation strategy for aggregating client models at the server. This approach adapts to each client's unique learning patterns, effectively addressing the challenges of diverse data across the network. Our comprehensive evaluation, conducted across three diverse real-world datasets, coupled with theoretical convergence guarantees, demonstrates the superior efficacy of our method compared to several established state-of-the-art approaches.

FedDUAL: A Dual-Strategy with Adaptive Loss and Dynamic Aggregation for Mitigating Data Heterogeneity in Federated Learning

TL;DR

FedDUAL addresses data heterogeneity in federated learning by pairing an adaptive, KL-regularized client-side loss with a distribution-aware server-side aggregation using a Wasserstein barycenter for the last layers. The adaptive loss balances local optimization with global coherence via a dynamically tuned parameter , while the server aggregates final-layer updates through a Sinkhorn-approximated Wasserstein barycenter to align diverse client learning behaviors. The approach yields faster convergence, higher accuracy, and flatter loss landscapes across CIFAR-10/100 and FMNIST under severe non-IID conditions, with theoretical convergence guarantees. Practically, FedDUAL offers robust generalization in heterogeneous real-world settings, at the cost of additional server-side computation that remains tractable when applied selectively to the final layers. Overall, the dual strategy demonstrates a meaningful advance in scalable, privacy-preserving learning under label skew and other non-IID data challenges.

Abstract

Federated Learning (FL) marks a transformative approach to distributed model training by combining locally optimized models from various clients into a unified global model. While FL preserves data privacy by eliminating centralized storage, it encounters significant challenges such as performance degradation, slower convergence, and reduced robustness of the global model due to the heterogeneity in client data distributions. Among the various forms of data heterogeneity, label skew emerges as a particularly formidable and prevalent issue, especially in domains such as image classification. To address these challenges, we begin with comprehensive experiments to pinpoint the underlying issues in the FL training process. Based on our findings, we then introduce an innovative dual-strategy approach designed to effectively resolve these issues. First, we introduce an adaptive loss function for client-side training, meticulously crafted to preserve previously acquired knowledge while maintaining an optimal equilibrium between local optimization and global model coherence. Secondly, we develop a dynamic aggregation strategy for aggregating client models at the server. This approach adapts to each client's unique learning patterns, effectively addressing the challenges of diverse data across the network. Our comprehensive evaluation, conducted across three diverse real-world datasets, coupled with theoretical convergence guarantees, demonstrates the superior efficacy of our method compared to several established state-of-the-art approaches.

Paper Structure

This paper contains 31 sections, 115 equations, 14 figures, 5 tables, 1 algorithm.

Figures (14)

  • Figure 1: Visualization of the loss surface for the global model trained on the FMNIST dataset using the FedAvg algorithm: (a) depicts the loss landscape when trained on IID data, while (b) illustrates the landscape for non-IID data distribution.
  • Figure 2: Comparison of gradient norms between models trained on IID and non-IID datasets using the FedAvg algorithm. (a) FMNIST dataset using LeNet model. (b) CIFAR10 dataset using VGG16 model.
  • Figure 3: Learning curves comparing the proposed method with baselines across various datasets: (a) CIFAR-10, (b) CIFAR-100, and (c) FMNIST.
  • Figure 4: Number of FL rounds required to reach the target accuracy for the proposed method and other baselines on different datasets: (a) CIFAR-10, (b) CIFAR-100, and (c) FMNIST.
  • Figure 5: Illustration of the Dynamic aggregation method applied across various layers of the neural network.
  • ...and 9 more figures