Layer-wise Update Aggregation with Recycling for Communication-Efficient Federated Learning
Jisoo Kim, Sungmin Kang, Sunwoo Lee
TL;DR
Federated Learning suffers from high communication costs, especially with large models. This paper introduces FedLUAR, a layer-wise update recycling framework that prioritizes layers by a gradient-to-weight ratio metric $s_{t,l} = \frac{\| \Delta_{t,l} \|}{\| \mathbf{x}_{t,l} \|}$ and recycles updates for a randomly selected subset of layers with probability $p_{t,l} = \frac{1/s_{t,l}}{\sum_l 1/s_{t,l}}$, thereby reducing communication while maintaining accuracy. The authors provide a convergence analysis under standard non-convex assumptions, showing bounded noise and convergence to a neighborhood when certain conditions on the learning rate and recycled layers are met. Empirically, FedLUAR delivers accuracy close to FedAvg across CIFAR-10/100, FEMNIST, and AG News, with communication costs dropping to as low as 17% of the FedAvg baseline in some datasets, and demonstrates compatibility with other FL optimizers. This approach offers a practical, architecture-agnostic path to scalable, communication-efficient federated learning, with potential extensions to large language model fine-tuning.
Abstract
Expensive communication cost is a common performance bottleneck in Federated Learning (FL), which makes it less appealing in real-world applications. Many communication-efficient FL methods focus on discarding a part of model updates mostly based on gradient magnitude. In this study, we find that recycling previous updates, rather than simply dropping them, more effectively reduces the communication cost while maintaining FL performance. We propose FedLUAR, a Layer-wise Update Aggregation with Recycling scheme for communication-efficient FL. We first define a useful metric that quantifies the extent to which the aggregated gradients influences the model parameter values in each layer. FedLUAR selects a few layers based on the metric and recycles their previous updates on the server side. Our extensive empirical study demonstrates that the update recycling scheme significantly reduces the communication cost while maintaining model accuracy. For example, our method achieves nearly the same AG News accuracy as FedAvg, while reducing the communication cost to just 17%.
