Table of Contents
Fetching ...

Layer-wise Update Aggregation with Recycling for Communication-Efficient Federated Learning

Jisoo Kim, Sungmin Kang, Sunwoo Lee

TL;DR

Federated Learning suffers from high communication costs, especially with large models. This paper introduces FedLUAR, a layer-wise update recycling framework that prioritizes layers by a gradient-to-weight ratio metric $s_{t,l} = \frac{\| \Delta_{t,l} \|}{\| \mathbf{x}_{t,l} \|}$ and recycles updates for a randomly selected subset of layers with probability $p_{t,l} = \frac{1/s_{t,l}}{\sum_l 1/s_{t,l}}$, thereby reducing communication while maintaining accuracy. The authors provide a convergence analysis under standard non-convex assumptions, showing bounded noise and convergence to a neighborhood when certain conditions on the learning rate and recycled layers are met. Empirically, FedLUAR delivers accuracy close to FedAvg across CIFAR-10/100, FEMNIST, and AG News, with communication costs dropping to as low as 17% of the FedAvg baseline in some datasets, and demonstrates compatibility with other FL optimizers. This approach offers a practical, architecture-agnostic path to scalable, communication-efficient federated learning, with potential extensions to large language model fine-tuning.

Abstract

Expensive communication cost is a common performance bottleneck in Federated Learning (FL), which makes it less appealing in real-world applications. Many communication-efficient FL methods focus on discarding a part of model updates mostly based on gradient magnitude. In this study, we find that recycling previous updates, rather than simply dropping them, more effectively reduces the communication cost while maintaining FL performance. We propose FedLUAR, a Layer-wise Update Aggregation with Recycling scheme for communication-efficient FL. We first define a useful metric that quantifies the extent to which the aggregated gradients influences the model parameter values in each layer. FedLUAR selects a few layers based on the metric and recycles their previous updates on the server side. Our extensive empirical study demonstrates that the update recycling scheme significantly reduces the communication cost while maintaining model accuracy. For example, our method achieves nearly the same AG News accuracy as FedAvg, while reducing the communication cost to just 17%.

Layer-wise Update Aggregation with Recycling for Communication-Efficient Federated Learning

TL;DR

Federated Learning suffers from high communication costs, especially with large models. This paper introduces FedLUAR, a layer-wise update recycling framework that prioritizes layers by a gradient-to-weight ratio metric and recycles updates for a randomly selected subset of layers with probability , thereby reducing communication while maintaining accuracy. The authors provide a convergence analysis under standard non-convex assumptions, showing bounded noise and convergence to a neighborhood when certain conditions on the learning rate and recycled layers are met. Empirically, FedLUAR delivers accuracy close to FedAvg across CIFAR-10/100, FEMNIST, and AG News, with communication costs dropping to as low as 17% of the FedAvg baseline in some datasets, and demonstrates compatibility with other FL optimizers. This approach offers a practical, architecture-agnostic path to scalable, communication-efficient federated learning, with potential extensions to large language model fine-tuning.

Abstract

Expensive communication cost is a common performance bottleneck in Federated Learning (FL), which makes it less appealing in real-world applications. Many communication-efficient FL methods focus on discarding a part of model updates mostly based on gradient magnitude. In this study, we find that recycling previous updates, rather than simply dropping them, more effectively reduces the communication cost while maintaining FL performance. We propose FedLUAR, a Layer-wise Update Aggregation with Recycling scheme for communication-efficient FL. We first define a useful metric that quantifies the extent to which the aggregated gradients influences the model parameter values in each layer. FedLUAR selects a few layers based on the metric and recycles their previous updates on the server side. Our extensive empirical study demonstrates that the update recycling scheme significantly reduces the communication cost while maintaining model accuracy. For example, our method achieves nearly the same AG News accuracy as FedAvg, while reducing the communication cost to just 17%.

Paper Structure

This paper contains 16 sections, 6 theorems, 30 equations, 7 figures, 11 tables, 2 algorithms.

Key Result

Lemma 3.1

(noise) Under assumption $1 \sim 3$, if the learning rate $\eta \leq \frac{1}{\mathcal{L}\tau}$, the accumulated noise is bounded as follows. where $m$ is the number of clients and $\kappa$ is the ratio of $\| \nabla \hat{F}(\mathbf{x}_t) \|^2$ to $\| \nabla F(\mathbf{x}_t) \|^2$ which is $\leq 1$.

Figures (7)

  • Figure 1: The layer-wise gradient norm and weight norm comparison (left) and the ratio of the gradient norm to the weight norm (right). The top-side (a) shows FEMNIST (CNN) and the bottom-side, (b) shows CIFAR-10 (ResNet20). It is clearly shown that the layers with the smallest gradients do not always least significantly affect the model parameter values.
  • Figure 2: A schematic illustration of FedLUAR. Each client sends out updates for layers with large $s_{t,l}$ values only. For the other layers, the server 'recycles' the previous updates.
  • Figure 3: The number of model aggregations per layer. In all the benchmarks, FedLUAR dramatically reduces the number of model aggregations in each layer. The difference between the FedAvg count and the FedLUAR count shows how many times the updates were recycled (how many communications were skipped).
  • Figure 4: The learning curve comparisons for CIFAR-10 (ResNet20). The x-axis represents the communication cost relative to FedAvg. To highlight the differences clearly, we selectively present comparisons among four different methods only.
  • Figure 5: The learning curve comparisons for CIFAR-100 (Wide-ResNet28-10). The x-axis represents the communication cost relative to FedAvg. FedPAQ has the least amount of communication cost for 300 epochs, however it loses the accuracy too much. FedLUAR nearly does not drop the accuracy while significantly reducing the communication cost.
  • ...and 2 more figures

Theorems & Definitions (10)

  • Lemma 3.1
  • Theorem 3.2
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • Lemma A.3
  • proof
  • Theorem A.4
  • proof