Table of Contents
Fetching ...

MimiC: Combating Client Dropouts in Federated Learning by Mimicking Central Updates

Yuchang Sun, Yuyi Mao, Jun Zhang

TL;DR

This work addresses convergence challenges in cross-device FL caused by arbitrary client dropouts by showing FedAvg can fail to converge under decaying learning rates due to a bias between the aggregated update and the central gradient. It then introduces MimiC, a server-side correction that mimics the central update via history-based correction terms, ensuring bounded divergence and convergence to a stationary point under proper learning-rate schedules. Theoretical results demonstrate convergence under both deterministic and probabilistic dropout, while extensive experiments on FMNIST and CIFAR-10 show MimiC achieving superior accuracy and stability compared with FedAvg, FedProx, SCAFFOLD, and MIFA. The approach is practical (no extra client computation) and privacy-friendly, providing a substantial improvement for reliable FL in mobile edge environments.

Abstract

Federated learning (FL) is a promising framework for privacy-preserving collaborative learning, where model training tasks are distributed to clients and only the model updates need to be collected at a server. However, when being deployed at mobile edge networks, clients may have unpredictable availability and drop out of the training process, which hinders the convergence of FL. This paper tackles such a critical challenge. Specifically, we first investigate the convergence of the classical FedAvg algorithm with arbitrary client dropouts. We find that with the common choice of a decaying learning rate, FedAvg oscillates around a stationary point of the global loss function, which is caused by the divergence between the aggregated and desired central update. Motivated by this new observation, we then design a novel training algorithm named MimiC, where the server modifies each received model update based on the previous ones. The proposed modification of the received model updates mimics the imaginary central update irrespective of dropout clients. The theoretical analysis of MimiC shows that divergence between the aggregated and central update diminishes with proper learning rates, leading to its convergence. Simulation results further demonstrate that MimiC maintains stable convergence performance and learns better models than the baseline methods.

MimiC: Combating Client Dropouts in Federated Learning by Mimicking Central Updates

TL;DR

This work addresses convergence challenges in cross-device FL caused by arbitrary client dropouts by showing FedAvg can fail to converge under decaying learning rates due to a bias between the aggregated update and the central gradient. It then introduces MimiC, a server-side correction that mimics the central update via history-based correction terms, ensuring bounded divergence and convergence to a stationary point under proper learning-rate schedules. Theoretical results demonstrate convergence under both deterministic and probabilistic dropout, while extensive experiments on FMNIST and CIFAR-10 show MimiC achieving superior accuracy and stability compared with FedAvg, FedProx, SCAFFOLD, and MIFA. The approach is practical (no extra client computation) and privacy-friendly, providing a substantial improvement for reliable FL in mobile edge environments.

Abstract

Federated learning (FL) is a promising framework for privacy-preserving collaborative learning, where model training tasks are distributed to clients and only the model updates need to be collected at a server. However, when being deployed at mobile edge networks, clients may have unpredictable availability and drop out of the training process, which hinders the convergence of FL. This paper tackles such a critical challenge. Specifically, we first investigate the convergence of the classical FedAvg algorithm with arbitrary client dropouts. We find that with the common choice of a decaying learning rate, FedAvg oscillates around a stationary point of the global loss function, which is caused by the divergence between the aggregated and desired central update. Motivated by this new observation, we then design a novel training algorithm named MimiC, where the server modifies each received model update based on the previous ones. The proposed modification of the received model updates mimics the imaginary central update irrespective of dropout clients. The theoretical analysis of MimiC shows that divergence between the aggregated and central update diminishes with proper learning rates, leading to its convergence. Simulation results further demonstrate that MimiC maintains stable convergence performance and learns better models than the baseline methods.
Paper Structure (18 sections, 11 theorems, 58 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 11 theorems, 58 equations, 8 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

For any FL algorithm with the global model update scheme in eq:global-update, the loss decay in each iteration is upper bounded as follows:

Figures (8)

  • Figure 1: An example of client dropouts in a cross-device FL system with ten classes of data. In this iteration, client $2$ is out of battery while client $5$ is disconnected from the server. The consequence is that information about classes $4$ and $5$ is missing in the received model updates.
  • Figure 2: Illustration of the client availability in cross-device FL, where $\tau_{\text{max}}=3$.
  • Figure 3: An illustration of MimiC with fours clients. While $\mathbf{w}_*$ is the optimum of the global loss function, clients perform updates towards the optimums of their local loss functions (denoted by $\{\mathbf{w}^i_*\}$'s). For clarity, consider both client $1$ and client $2$ are active in iteration $t^{\prime}$ and $t$. In iteration $t$, $\mathbf{c}_{t^{\prime}}^{1}$ and $\mathbf{c}_{t^{\prime}}^{2}$ are used to correct their updates. The average of the modified updates gives the applied global update $\mathbf{v}_t$.
  • Figure 4: Test accuracy on the FMNIST dataset, where client $i$ becomes active every $\tau_{\text{max}}(i)$ iterations ($\tau_{\text{max}} = 20$).
  • Figure 5: Test accuracy on the CIFAR-10 dataset, where client $i$ becomes active every $\tau_{\text{max}}(i)$ iterations ($\tau_{\text{max}} = 20$).
  • ...and 3 more figures

Theorems & Definitions (24)

  • Lemma 1
  • Proof
  • Remark 1
  • Lemma 2
  • proof : Proof Sketch
  • Lemma 3
  • proof
  • Theorem 1
  • proof : Proof Sketch
  • Remark 2
  • ...and 14 more