Stochastic Controlled Averaging for Federated Learning with Communication Compression

Xinmeng Huang; Ping Li; Xiaoyun Li

Stochastic Controlled Averaging for Federated Learning with Communication Compression

Xinmeng Huang, Ping Li, Xiaoyun Li

TL;DR

The paper addresses the challenge of preserving performance in federated learning when incorporating communication compression under arbitrary data heterogeneity and partial participation. It introduces a simplified stochastic controlled averaging framework built on SCAFFOLD, enabling halved uplink communication and two algorithms: SCALLION for unbiased compression and SCAFCOM for biased compression with momentum. Theoretical results establish state-of-the-art nonconvex convergence and tight asymptotic communication/computation complexities under minimal assumptions, while experiments on MNIST and Fashion-MNIST show near-full-precision performance with substantial uplink reductions and superiority over existing compressed FL methods. The work significantly advances practical, robust compressed FL, combining momentum, error-feedback-like ideas, and a single-variable uplink scheme. The methods are poised to impact large-scale FL deployments by dramatically reducing communication overhead without compromising convergence under heterogeneous client data and partial participation.

Abstract

Communication compression, a technique aiming to reduce the information volume to be transmitted over the air, has gained great interests in Federated Learning (FL) for the potential of alleviating its communication overhead. However, communication compression brings forth new challenges in FL due to the interplay of compression-incurred information distortion and inherent characteristics of FL such as partial participation and data heterogeneity. Despite the recent development, the performance of compressed FL approaches has not been fully exploited. The existing approaches either cannot accommodate arbitrary data heterogeneity or partial participation, or require stringent conditions on compression. In this paper, we revisit the seminal stochastic controlled averaging method by proposing an equivalent but more efficient/simplified formulation with halved uplink communication costs. Building upon this implementation, we propose two compressed FL algorithms, SCALLION and SCAFCOM, to support unbiased and biased compression, respectively. Both the proposed methods outperform the existing compressed FL methods in terms of communication and computation complexities. Moreover, SCALLION and SCAFCOM accommodates arbitrary data heterogeneity and do not make any additional assumptions on compression errors. Experiments show that SCALLION and SCAFCOM can match the performance of corresponding full-precision FL approaches with substantially reduced uplink communication, and outperform recent compressed FL methods under the same communication budget.

Stochastic Controlled Averaging for Federated Learning with Communication Compression

TL;DR

Abstract

Paper Structure (40 sections, 17 theorems, 108 equations, 4 figures, 1 table, 4 algorithms)

This paper contains 40 sections, 17 theorems, 108 equations, 4 figures, 1 table, 4 algorithms.

Introduction
Main Results & Contributions
Related Work
Communication compression & error feedback.
Federated learning with compression.
Federated learning with momentum.
Problem Setup
SCALLION: Single-round Compressed Communication
Background of SCAFFOLD
Development of SCALLION
We only need to communicate $\Delta_i^t$.
Benefits of compressing $\Delta_i^t$.
Comparison with FedPAQ reisizadeh2020fedpaq, FedCOM haddadpour2021federated, Fed-EF li2023analysis.
Comparison with FedCOMGATE haddadpour2021federated.
Convergence of SCALLION
...and 25 more sections

Key Result

Theorem 1

Under Assumptions asp:smooth and asp:gd-noise, supposing clients apply mutually independent $\omega$-unbiased compressors, if we initialize $c_i^0=\nabla f_i(x^0)$ and $c^0=\nabla f(x^0)$, and set learning rates $\eta_l$, $\eta_g$ and the scaling factor $\alpha$ as in eqn:scallion-para, then SCALLIO where $\Delta \triangleq f(x^0)-\min f(x)$. A detailed version and the proof are in Appendix app:sc

Figures (4)

Figure 1: Interplay of client-drift and inaccurate message aggregation incurred by communication compression in FedAvg is illustrated for $2$ clients with $3$ local steps ( i.e., $S=N=2$, $K=3$). The client updates $y_i$ (blue circle) move towards the individual client optima $x_i^\star$ (blue square). The server updates (black circle) move towards a distorted proxy, depending on the degree of compression, of the full-precision averaged model $\frac{1}{N}\sum_{i=1}^N x_i^\star$ (grey circle), instead of the true optimum $x^\star$ (white square).
Figure 2: Train loss and test accuracy of SCAFCOM (Algorithm \ref{['alg:scafcom']}) and Fed-EF li2023analysis with biased Top-$r$ compressors on MNIST (top row) and FMNIST (bottom row).
Figure 3: Train loss and test accuracy of SCALLION (Algorithm \ref{['alg:scallion']}) and FedCOMGATE haddadpour2021federated with unbiased random dithering on MNIST (top row) and FMNIST (bottom row).
Figure 4: Test accuracy on MNIST and FMNIST of SCAFCOM (biased Top-0.01) and SCALLION (unbiased 2-bit random dithering), with different $\beta$ and $\alpha$ values.

Theorems & Definitions (39)

Definition 1: $\omega$-unbiased compressor
Example 1: Random sparsification wangni2018gradient
Example 2: Random dithering alistarh2017qsgd
Theorem 1: SCALLION with unbiased compression
Remark 1
Remark 2: Weak assumptions of this work
Definition 2: $q^2$-contractive compressor
Example 3: Top-$r$ operator
Example 4: Grouped sign
Theorem 2: SCAFCOM with biased compression
...and 29 more

Stochastic Controlled Averaging for Federated Learning with Communication Compression

TL;DR

Abstract

Stochastic Controlled Averaging for Federated Learning with Communication Compression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (39)