Table of Contents
Fetching ...

Robust Model Aggregation for Heterogeneous Federated Learning: Analysis and Optimizations

Yumeng Shao, Jun Li, Long Shi, Kang Wei, Ming Ding, Qianmu Li, Zengxiang Li, Wen Chen, Shi Jin

TL;DR

The paper tackles latency and inconsistency in federated learning on heterogeneous devices by introducing Time-driven SFL (T-SFL), which aggregates at fixed time intervals with varying local iterations across clients. It derives a loss upper bound under T-SFL and optimizes aggregation weights to minimize this bound, then introduces the Discriminative Model Select (DMS) algorithm to filter underperforming clients and bias aggregation toward higher-quality updates. Empirical evaluations on MNIST, CIFAR-10, Fashion-MNIST, and SVHN show that T-SFL with DMS can cut training latency by about 50% and achieve up to 7% accuracy gains over AFL baselines, with competitive performance versus FedAvg/FedProx. The approach provides a robust, scalable FL framework for edge scenarios with dynamic participation and resource heterogeneity, supported by convergence guarantees and comprehensive experiments.

Abstract

Conventional synchronous federated learning (SFL) frameworks suffer from performance degradation in heterogeneous systems due to imbalanced local data size and diverse computing power on the client side. To address this problem, asynchronous FL (AFL) and semi-asynchronous FL have been proposed to recover the performance loss by allowing asynchronous aggregation. However, asynchronous aggregation incurs a new problem of inconsistency between local updates and global updates. Motivated by the issues of conventional SFL and AFL, we first propose a time-driven SFL (T-SFL) framework for heterogeneous systems. The core idea of T-SFL is that the server aggregates the models from different clients, each with varying numbers of iterations, at regular time intervals. To evaluate the learning performance of T-SFL, we provide an upper bound on the global loss function. Further, we optimize the aggregation weights to minimize the developed upper bound. Then, we develop a discriminative model selection (DMS) algorithm that removes local models from clients whose number of iterations falls below a predetermined threshold. In particular, this algorithm ensures that each client's aggregation weight accurately reflects its true contribution to the global model update, thereby improving the efficiency and robustness of the system. To validate the effectiveness of T-SFL with the DMS algorithm, we conduct extensive experiments using several popular datasets including MNIST, Cifar-10, Fashion-MNIST, and SVHN. The experimental results demonstrate that T-SFL with the DMS algorithm can reduce the latency of conventional SFL by 50\%, while achieving an average 3\% improvement in learning accuracy over state-of-the-art AFL algorithms.

Robust Model Aggregation for Heterogeneous Federated Learning: Analysis and Optimizations

TL;DR

The paper tackles latency and inconsistency in federated learning on heterogeneous devices by introducing Time-driven SFL (T-SFL), which aggregates at fixed time intervals with varying local iterations across clients. It derives a loss upper bound under T-SFL and optimizes aggregation weights to minimize this bound, then introduces the Discriminative Model Select (DMS) algorithm to filter underperforming clients and bias aggregation toward higher-quality updates. Empirical evaluations on MNIST, CIFAR-10, Fashion-MNIST, and SVHN show that T-SFL with DMS can cut training latency by about 50% and achieve up to 7% accuracy gains over AFL baselines, with competitive performance versus FedAvg/FedProx. The approach provides a robust, scalable FL framework for edge scenarios with dynamic participation and resource heterogeneity, supported by convergence guarantees and comprehensive experiments.

Abstract

Conventional synchronous federated learning (SFL) frameworks suffer from performance degradation in heterogeneous systems due to imbalanced local data size and diverse computing power on the client side. To address this problem, asynchronous FL (AFL) and semi-asynchronous FL have been proposed to recover the performance loss by allowing asynchronous aggregation. However, asynchronous aggregation incurs a new problem of inconsistency between local updates and global updates. Motivated by the issues of conventional SFL and AFL, we first propose a time-driven SFL (T-SFL) framework for heterogeneous systems. The core idea of T-SFL is that the server aggregates the models from different clients, each with varying numbers of iterations, at regular time intervals. To evaluate the learning performance of T-SFL, we provide an upper bound on the global loss function. Further, we optimize the aggregation weights to minimize the developed upper bound. Then, we develop a discriminative model selection (DMS) algorithm that removes local models from clients whose number of iterations falls below a predetermined threshold. In particular, this algorithm ensures that each client's aggregation weight accurately reflects its true contribution to the global model update, thereby improving the efficiency and robustness of the system. To validate the effectiveness of T-SFL with the DMS algorithm, we conduct extensive experiments using several popular datasets including MNIST, Cifar-10, Fashion-MNIST, and SVHN. The experimental results demonstrate that T-SFL with the DMS algorithm can reduce the latency of conventional SFL by 50\%, while achieving an average 3\% improvement in learning accuracy over state-of-the-art AFL algorithms.
Paper Structure (20 sections, 5 theorems, 33 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 5 theorems, 33 equations, 7 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

The upper bound on the loss function of T-SFL is expressed as where $\Gamma_i = \Vert \boldsymbol{w}^* - \boldsymbol{w}_i^* \Vert^2$, $\boldsymbol{w}^*$ denotes the optimal global model, $\boldsymbol{w}_i^*$ denotes the optimal local model, $\frac{\Vert \boldsymbol{w}^0 - \boldsymbol{w}^* \Vert_2^2}{W}$ represents the initial item, $\frac{X}{W}$ represents the gradi

Figures (7)

  • Figure 1: System model of T-SFL. For example, during the $t$-th communication interval, the server generates the global model based on the individual models submitted by each client with the varying numbers of iterations $\tau_1, \tau_2,...,\tau_N$.
  • Figure 2: Performance comparison between conventional FL, AFL, and T-SFL. (a) Test accuracy of T-SFL versus $T$ on MNIST dataset for systems with varying degrees of heterogeneity. (b) Training time comparison on MNIST dataset for systems with varying degrees of heterogeneity.
  • Figure 3: Test accuracy versus $T$ for (a) MNIST dataset, (b) Cifar-10 dataset, (c) F-MNIST dataset, and (d) SVHN dataset under Case 1.
  • Figure 4: Test accuracy versus $T$ for (a) MNIST dataset, (b) Cifar-10 dataset, (c) F-MNIST dataset, and (d) SVHN dataset under Case 2.
  • Figure 5: Test accuracy versus $T$ for (a) MNIST dataset, (b) Cifar-10 dataset, (c) F-MNIST dataset, and (d) SVHN dataset under Case 3.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Theorem 3
  • Definition 3
  • Theorem 4