Table of Contents
Fetching ...

E-3SFC: Communication-Efficient Federated Learning with Double-way Features Synthesizing

Yuhao Zhou, Yuxin Tian, Mingjia Shi, Yuanxi Li, Yanan Sun, Qing Ye, Jiancheng Lv

TL;DR

This work addresses the high communication cost in Federated Learning by introducing E-3SFC, a framework that compresses gradients into tiny synthetic features via 3SFC, while employing error feedback and a dynamic budget scheduler. A novel double-way compression scheme extends compression to the global model download, synchronized by shared training priors, and a budget scheduler allocates communication budgets adaptively across rounds. Theoretical analysis establishes convergence rates under strongly convex and non-convex settings, with explicit dependence on aggregation noise and compression parameters. Empirically, E-3SFC achieves up to 13.4% accuracy gains with as much as 111.6× reduction in communication across six datasets and six models, outperforming state-of-the-art baselines and demonstrating practical impact for scalable FL.

Abstract

The exponential growth in model sizes has significantly increased the communication burden in Federated Learning (FL). Existing methods to alleviate this burden by transmitting compressed gradients often face high compression errors, which slow down the model's convergence. To simultaneously achieve high compression effectiveness and lower compression errors, we study the gradient compression problem from a novel perspective. Specifically, we propose a systematical algorithm termed Extended Single-Step Synthetic Features Compressing (E-3SFC), which consists of three sub-components, i.e., the Single-Step Synthetic Features Compressor (3SFC), a double-way compression algorithm, and a communication budget scheduler. First, we regard the process of gradient computation of a model as decompressing gradients from corresponding inputs, while the inverse process is considered as compressing the gradients. Based on this, we introduce a novel gradient compression method termed 3SFC, which utilizes the model itself as a decompressor, leveraging training priors such as model weights and objective functions. 3SFC compresses raw gradients into tiny synthetic features in a single-step simulation, incorporating error feedback to minimize overall compression errors. To further reduce communication overhead, 3SFC is extended to E-3SFC, allowing double-way compression and dynamic communication budget scheduling. Our theoretical analysis under both strongly convex and non-convex conditions demonstrates that 3SFC achieves linear and sub-linear convergence rates with aggregation noise. Extensive experiments across six datasets and six models reveal that 3SFC outperforms state-of-the-art methods by up to 13.4% while reducing communication costs by 111.6 times. These findings suggest that 3SFC can significantly enhance communication efficiency in FL without compromising model performance.

E-3SFC: Communication-Efficient Federated Learning with Double-way Features Synthesizing

TL;DR

This work addresses the high communication cost in Federated Learning by introducing E-3SFC, a framework that compresses gradients into tiny synthetic features via 3SFC, while employing error feedback and a dynamic budget scheduler. A novel double-way compression scheme extends compression to the global model download, synchronized by shared training priors, and a budget scheduler allocates communication budgets adaptively across rounds. Theoretical analysis establishes convergence rates under strongly convex and non-convex settings, with explicit dependence on aggregation noise and compression parameters. Empirically, E-3SFC achieves up to 13.4% accuracy gains with as much as 111.6× reduction in communication across six datasets and six models, outperforming state-of-the-art baselines and demonstrating practical impact for scalable FL.

Abstract

The exponential growth in model sizes has significantly increased the communication burden in Federated Learning (FL). Existing methods to alleviate this burden by transmitting compressed gradients often face high compression errors, which slow down the model's convergence. To simultaneously achieve high compression effectiveness and lower compression errors, we study the gradient compression problem from a novel perspective. Specifically, we propose a systematical algorithm termed Extended Single-Step Synthetic Features Compressing (E-3SFC), which consists of three sub-components, i.e., the Single-Step Synthetic Features Compressor (3SFC), a double-way compression algorithm, and a communication budget scheduler. First, we regard the process of gradient computation of a model as decompressing gradients from corresponding inputs, while the inverse process is considered as compressing the gradients. Based on this, we introduce a novel gradient compression method termed 3SFC, which utilizes the model itself as a decompressor, leveraging training priors such as model weights and objective functions. 3SFC compresses raw gradients into tiny synthetic features in a single-step simulation, incorporating error feedback to minimize overall compression errors. To further reduce communication overhead, 3SFC is extended to E-3SFC, allowing double-way compression and dynamic communication budget scheduling. Our theoretical analysis under both strongly convex and non-convex conditions demonstrates that 3SFC achieves linear and sub-linear convergence rates with aggregation noise. Extensive experiments across six datasets and six models reveal that 3SFC outperforms state-of-the-art methods by up to 13.4% while reducing communication costs by 111.6 times. These findings suggest that 3SFC can significantly enhance communication efficiency in FL without compromising model performance.

Paper Structure

This paper contains 17 sections, 6 theorems, 42 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Lemma 5.1

Under Assumption assump:4, with $\langle w^{t_a} - w^*, \nabla F(w^{t_a}) \rangle \geq F(w^{t_a}) - F(w^*) + \frac{\mu_F}{2} ||w^{t_a} - w^*|| ^ 2$, let $\tilde{\eta} = K \eta$, the upper-bound of local shift is:

Figures (8)

  • Figure 1: Information Compression Rate v.s. Model Convergence: The rate of convergence reduces as the compression rate decreases. The evaluated MLP model is trained on non-iid MNIST with 20 FL clients.
  • Figure 2: Relationship between E-3SFC and 3SFC
  • Figure 3: Above: When fitting gradients obtained by 128 steps of SGD using 128 steps of simulation goetz2020federated, it collapsed. Meanwhile, E-3SFC requires only one step of simulation, occupying less computation and storage, but achieves significantly better results. Below: Before the collapse, the gradients of its trainable parameters exhibit a phenomenon similar to the gradient explosion, where the magnitude of gradients increases as they back-propagate from the 128-th to the first group of parameters. This could be a possible reason for the collapse.
  • Figure 4: The general workflow of 3SFC. When compressing in ❹, a set of trainable parameters and labels (i.e., synthetic features) will first be fed into the frozen local model to calculate model gradients. Then, calculated model gradients will be compared with real model gradients to optimize the synthetic features. When decompressing in ❶, simply feed the local model with the received synthetic features and use the generated gradients to update the global model.
  • Figure 5: Illustration of our manual dataset partitions for 20 clients based on the Dirichlet distribution. Each bar represents a client, and different segments with different colors of a bar represent different labels. As can be seen, different clients have different dataset sizes and dataset distributions, and some clients only have some of the labels.
  • ...and 3 more figures

Theorems & Definitions (12)

  • Lemma 5.1: Bounded Local Shift
  • Lemma 5.2: Bounded Local Approximation
  • Theorem 5.3
  • Corollary 5.3.1
  • Remark 5.3.1
  • Theorem 5.4
  • Corollary 5.4.1
  • Remark 5.4.1
  • proof
  • proof
  • ...and 2 more