Table of Contents
Fetching ...

Communication-Efficient Federated Learning With Data and Client Heterogeneity

Hossein Zakerinia, Shayan Talaei, Giorgi Nadiradze, Dan Alistarh

TL;DR

This work tackles scalable Federated Learning under three practical challenges: data heterogeneity across clients, partial client asynchrony, and communication bottlenecks. It introduces QuAFL, a Quantized Asynchronous Federated Learning algorithm that extends FedAvg with non-blocking, quantized updates and a calibrated weighting scheme to accommodate heterogeneous client speeds. The authors provide a rigorous convergence analysis using a potential function and a position-aware lattice quantizer, showing convergence rates close to FedAvg in certain regimes and robustness to slow clients and non-i.i.d. data. Empirical results on LEAF benchmarks with up to 300 clients demonstrate substantial communication compression (over 3x) and improved wall-clock time convergence relative to baselines like FedBuff, highlighting QuAFL’s practical impact for real-world, large-scale federated systems.

Abstract

Federated Learning (FL) enables large-scale distributed training of machine learning models, while still allowing individual nodes to maintain data locally. However, executing FL at scale comes with inherent practical challenges: 1) heterogeneity of the local node data distributions, 2) heterogeneity of node computational speeds (asynchrony), but also 3) constraints in the amount of communication between the clients and the server. In this work, we present the first variant of the classic federated averaging (FedAvg) algorithm which, at the same time, supports data heterogeneity, partial client asynchrony, and communication compression. Our algorithm comes with a novel, rigorous analysis showing that, in spite of these system relaxations, it can provide similar convergence to FedAvg in interesting parameter regimes. Experimental results in the rigorous LEAF benchmark on setups of up to 300 nodes show that our algorithm ensures fast convergence for standard federated tasks, improving upon prior quantized and asynchronous approaches.

Communication-Efficient Federated Learning With Data and Client Heterogeneity

TL;DR

This work tackles scalable Federated Learning under three practical challenges: data heterogeneity across clients, partial client asynchrony, and communication bottlenecks. It introduces QuAFL, a Quantized Asynchronous Federated Learning algorithm that extends FedAvg with non-blocking, quantized updates and a calibrated weighting scheme to accommodate heterogeneous client speeds. The authors provide a rigorous convergence analysis using a potential function and a position-aware lattice quantizer, showing convergence rates close to FedAvg in certain regimes and robustness to slow clients and non-i.i.d. data. Empirical results on LEAF benchmarks with up to 300 clients demonstrate substantial communication compression (over 3x) and improved wall-clock time convergence relative to baselines like FedBuff, highlighting QuAFL’s practical impact for real-world, large-scale federated systems.

Abstract

Federated Learning (FL) enables large-scale distributed training of machine learning models, while still allowing individual nodes to maintain data locally. However, executing FL at scale comes with inherent practical challenges: 1) heterogeneity of the local node data distributions, 2) heterogeneity of node computational speeds (asynchrony), but also 3) constraints in the amount of communication between the clients and the server. In this work, we present the first variant of the classic federated averaging (FedAvg) algorithm which, at the same time, supports data heterogeneity, partial client asynchrony, and communication compression. Our algorithm comes with a novel, rigorous analysis showing that, in spite of these system relaxations, it can provide similar convergence to FedAvg in interesting parameter regimes. Experimental results in the rigorous LEAF benchmark on setups of up to 300 nodes show that our algorithm ensures fast convergence for standard federated tasks, improving upon prior quantized and asynchronous approaches.
Paper Structure (42 sections, 28 theorems, 93 equations, 26 figures)

This paper contains 42 sections, 28 theorems, 93 equations, 26 figures.

Key Result

Lemma 3.1

(Lattice Quantization) Fix parameters $R$ and $\gamma > 0$. There exists a quantization procedure defined by an encoding function $Enc_{R,\gamma} : \mathbb{R}^d \rightarrow {\{0,1\}}^*$ and a decoding function $Dec_{R,\gamma}=\mathbb{R}^d \times {\{0,1\}}^* \rightarrow \mathbb{R}^d$ such that, for a

Figures (26)

  • Figure 1: Impact of the number of peers $s\in \{10, 20, 30, 40\}$ on convergence, for $n = 100$ clients, $14$-bit quantization, on CelebA, using non-i.i.d data.
  • Figure 2: Convergence comparison relative to simulated time between QuAFL and FedAvg for ResNet20/CIFAR10.
  • Figure 3: The impact of averaging variants vs. validation accuracy on ResNet/CelebA, non-i.i.d data.
  • Figure 4: ResNet20/CIFAR10 experiment where Fast and Slow clients have non-i.i.d. data from different classes.
  • Figure 5: Experiment in which the average time per client per local step is uniformly random between 2 and 9, showing superior performance relative to FedBuff, which becomes unstable when both data and client speeds are heterogeneous (ResNet20/CIFAR10).
  • ...and 21 more figures

Theorems & Definitions (52)

  • Lemma 3.1
  • Theorem 3.2
  • Corollary 3.3
  • Lemma 3.4
  • Lemma 3.5
  • Lemma 3.6
  • Lemma 3.7
  • Lemma B.1
  • proof
  • Lemma B.2
  • ...and 42 more