FedScalar: A Communication efficient Federated Learning
M. Rostami, S. S. Kia
TL;DR
FedScalar tackles the communication bottleneck in Federated Learning by encoding each client's local update into a single scalar via a random projection, sending this scalar along with a seed, enabling the server to reconstruct and average updates efficiently. The paper proves a convergence rate of $O(d/\sqrt{K})$ for smooth non-convex objectives and shows that using a Rademacher distribution for the projection vector reduces variance in aggregation. Numerical simulations on a small neural network demonstrate near-FedAvg performance with only two scalars communicated per client per round, highlighting substantial communication savings. The work opens avenues for accelerated variants and privacy enhancements by leveraging scalar encodings.
Abstract
Federated learning (FL) has gained considerable popularity for distributed machine learning due to its ability to preserve the privacy of participating agents by eliminating the need for data aggregation. Nevertheless, communication costs between agents and the central server in FL are substantial in large-scale problems and remain a limiting factor for this algorithm. This paper introduces an innovative algorithm, called FedScalar, within the FL framework aimed at improving communication efficiency. Unlike traditional FL methods that require agents to send high-dimensional vectors to the server, FedScalar enables agents to communicate updates using only two scalar values. Each agent encodes its updated model parameters into a scalar via the inner product between its local update difference and a random vector, which is transmitted to the server along with the agent's local random seed value. The server then averages the received scalar values and decodes the information by projecting the averaged scalar onto the regenerated random vector using the corresponding agent seed values. Our method thereby significantly reduces communication overhead. Technically, we demonstrate that the proposed algorithm achieves a convergence rate of O(d/\sqrt(K)) to a stationary point for smooth, non-convex loss functions. Additionally, our analysis shows that changing the underlying distribution of the random vector generated by the server from Gaussian to Rademacher distribution reduces the variance during the aggregation step of the algorithm. Finally, we validate the performance and communication efficiency of our algorithm with numerical simulations.
