Table of Contents
Fetching ...

Debiasing Federated Learning with Correlated Client Participation

Zhenyu Sun, Ziyang Zhang, Zheng Xu, Gauri Joshi, Pranay Sharma, Ermin Wei

TL;DR

This work tackles bias in FedAvg caused by non-uniform, time-correlated client participation under a minimum-separation constraint $R$. It introduces an $R$-order Markov-chain framework to rigorously characterize participation dynamics, proving that asymptotic bias vanishes only when the induced sampling distribution $\pi_R$ is uniform. To address this, the authors propose Debiasing FedAvg, which estimates $\pi_R^i$ online and reweights local updates by $1/\pi_R^i$, yielding provable convergence to the unbiased optimum for arbitrary $R$ (excluding the cyclic case $R=M-1$). Theoretical results are supported by synthetic and MNIST experiments, showing reduced bias and often faster convergence as $R$ grows, highlighting practical paths to robust FL under realistic participation patterns. The approach advances federated learning by providing unbiased convergence guarantees without requiring known participation distributions and by showcasing a scalable debiasing mechanism adaptable to existing FL algorithms.

Abstract

In cross-device federated learning (FL) with millions of mobile clients, only a small subset of clients participate in training in every communication round, and Federated Averaging (FedAvg) is the most popular algorithm in practice. Existing analyses of FedAvg usually assume the participating clients are independently sampled in each round from a uniform distribution, which does not reflect real-world scenarios. This paper introduces a theoretical framework that models client participation in FL as a Markov chain to study optimization convergence when clients have non-uniform and correlated participation across rounds. We apply this framework to analyze a more general and practical pattern: every client must wait a minimum number of $R$ rounds (minimum separation) before re-participating. We theoretically prove and empirically observe that increasing minimum separation reduces the bias induced by intrinsic non-uniformity of client availability in cross-device FL systems. Furthermore, we develop an effective debiasing algorithm for FedAvg that provably converges to the unbiased optimal solution under arbitrary minimum separation and unknown client availability distribution.

Debiasing Federated Learning with Correlated Client Participation

TL;DR

This work tackles bias in FedAvg caused by non-uniform, time-correlated client participation under a minimum-separation constraint . It introduces an -order Markov-chain framework to rigorously characterize participation dynamics, proving that asymptotic bias vanishes only when the induced sampling distribution is uniform. To address this, the authors propose Debiasing FedAvg, which estimates online and reweights local updates by , yielding provable convergence to the unbiased optimum for arbitrary (excluding the cyclic case ). Theoretical results are supported by synthetic and MNIST experiments, showing reduced bias and often faster convergence as grows, highlighting practical paths to robust FL under realistic participation patterns. The approach advances federated learning by providing unbiased convergence guarantees without requiring known participation distributions and by showcasing a scalable debiasing mechanism adaptable to existing FL algorithms.

Abstract

In cross-device federated learning (FL) with millions of mobile clients, only a small subset of clients participate in training in every communication round, and Federated Averaging (FedAvg) is the most popular algorithm in practice. Existing analyses of FedAvg usually assume the participating clients are independently sampled in each round from a uniform distribution, which does not reflect real-world scenarios. This paper introduces a theoretical framework that models client participation in FL as a Markov chain to study optimization convergence when clients have non-uniform and correlated participation across rounds. We apply this framework to analyze a more general and practical pattern: every client must wait a minimum number of rounds (minimum separation) before re-participating. We theoretically prove and empirically observe that increasing minimum separation reduces the bias induced by intrinsic non-uniformity of client availability in cross-device FL systems. Furthermore, we develop an effective debiasing algorithm for FedAvg that provably converges to the unbiased optimal solution under arbitrary minimum separation and unknown client availability distribution.
Paper Structure (23 sections, 88 equations, 4 figures, 1 algorithm)

This paper contains 23 sections, 88 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Distance between $\pi_R$ and the uniform distribution as $R$ increases ($N=500, B=1$)
  • Figure 2: Experiments on synthetic dataset. (a) The training loss of Vanilla FedAvg (after convergence) with different $R$ is shown. Larger $R$ leads to smaller bias. (b) Debiasing FedAvg is tested under different values of $R$, where the red line represents Vanilla FedAvg when clients are sampled under an oracle uniform distribution. The subfigure on the right shows that all curves reach unbiased objective after convergence, indicating that the asymptotic bias is effectively canceled.
  • Figure 3: Experiments on MNIST. (a) The convergence of our Debiasing FedAvg under different client minimum separation $R$ configurations. The red horizontal line is the convergence value of the objective function by vanilla FedAvg when clients are sampled under an oracle uniform distribution. Our Debiasing FedAvg converges to the unbiased objective with larger $R$ converges faster. (b) For Vanilla FedAvg, increasing $R$ causes smaller bias. (c) When $R=8$, Vanilla FedAvg, FedVARP and Debiasing FedAvg are compared. Note that both Vanilla FedAvg and FedVARP are designed only for uniform client sampling and hence are significantly affected by bias from client participation.
  • Figure 4: Convergence of client sampling distribution to $\pi_R$ for different $R$ ($N=100, B=1$).

Theorems & Definitions (12)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • ...and 2 more