Table of Contents
Fetching ...

FedScalar: A Communication efficient Federated Learning

M. Rostami, S. S. Kia

TL;DR

FedScalar tackles the communication bottleneck in Federated Learning by encoding each client's local update into a single scalar via a random projection, sending this scalar along with a seed, enabling the server to reconstruct and average updates efficiently. The paper proves a convergence rate of $O(d/\sqrt{K})$ for smooth non-convex objectives and shows that using a Rademacher distribution for the projection vector reduces variance in aggregation. Numerical simulations on a small neural network demonstrate near-FedAvg performance with only two scalars communicated per client per round, highlighting substantial communication savings. The work opens avenues for accelerated variants and privacy enhancements by leveraging scalar encodings.

Abstract

Federated learning (FL) has gained considerable popularity for distributed machine learning due to its ability to preserve the privacy of participating agents by eliminating the need for data aggregation. Nevertheless, communication costs between agents and the central server in FL are substantial in large-scale problems and remain a limiting factor for this algorithm. This paper introduces an innovative algorithm, called FedScalar, within the FL framework aimed at improving communication efficiency. Unlike traditional FL methods that require agents to send high-dimensional vectors to the server, FedScalar enables agents to communicate updates using only two scalar values. Each agent encodes its updated model parameters into a scalar via the inner product between its local update difference and a random vector, which is transmitted to the server along with the agent's local random seed value. The server then averages the received scalar values and decodes the information by projecting the averaged scalar onto the regenerated random vector using the corresponding agent seed values. Our method thereby significantly reduces communication overhead. Technically, we demonstrate that the proposed algorithm achieves a convergence rate of O(d/\sqrt(K)) to a stationary point for smooth, non-convex loss functions. Additionally, our analysis shows that changing the underlying distribution of the random vector generated by the server from Gaussian to Rademacher distribution reduces the variance during the aggregation step of the algorithm. Finally, we validate the performance and communication efficiency of our algorithm with numerical simulations.

FedScalar: A Communication efficient Federated Learning

TL;DR

FedScalar tackles the communication bottleneck in Federated Learning by encoding each client's local update into a single scalar via a random projection, sending this scalar along with a seed, enabling the server to reconstruct and average updates efficiently. The paper proves a convergence rate of for smooth non-convex objectives and shows that using a Rademacher distribution for the projection vector reduces variance in aggregation. Numerical simulations on a small neural network demonstrate near-FedAvg performance with only two scalars communicated per client per round, highlighting substantial communication savings. The work opens avenues for accelerated variants and privacy enhancements by leveraging scalar encodings.

Abstract

Federated learning (FL) has gained considerable popularity for distributed machine learning due to its ability to preserve the privacy of participating agents by eliminating the need for data aggregation. Nevertheless, communication costs between agents and the central server in FL are substantial in large-scale problems and remain a limiting factor for this algorithm. This paper introduces an innovative algorithm, called FedScalar, within the FL framework aimed at improving communication efficiency. Unlike traditional FL methods that require agents to send high-dimensional vectors to the server, FedScalar enables agents to communicate updates using only two scalar values. Each agent encodes its updated model parameters into a scalar via the inner product between its local update difference and a random vector, which is transmitted to the server along with the agent's local random seed value. The server then averages the received scalar values and decodes the information by projecting the averaged scalar onto the regenerated random vector using the corresponding agent seed values. Our method thereby significantly reduces communication overhead. Technically, we demonstrate that the proposed algorithm achieves a convergence rate of O(d/\sqrt(K)) to a stationary point for smooth, non-convex loss functions. Additionally, our analysis shows that changing the underlying distribution of the random vector generated by the server from Gaussian to Rademacher distribution reduces the variance during the aggregation step of the algorithm. Finally, we validate the performance and communication efficiency of our algorithm with numerical simulations.
Paper Structure (6 sections, 6 theorems, 38 equations, 3 figures, 1 algorithm)

This paper contains 6 sections, 6 theorems, 38 equations, 3 figures, 1 algorithm.

Key Result

Lemma III.1

Let $\boldsymbol{\mathbf{v}} \in \mathbb{R}^d$ be a random vector with each entry $v_i$ being independent and identically distributed with zero mean and unit variance. Then, the projected directional derivative along the $\boldsymbol{\mathbf{v}}$ is an unbiased estimate of $\nabla f (\boldsymbol{\ma $\Box$

Figures (3)

  • Figure 1: Federated Learning structure where $\boldsymbol{\mathbf{x}}$ represents the set of server's parameters.
  • Figure 2: Training loss plot of Algorithm \ref{['alg2_new']} for the cases $\boldsymbol{\mathbf{v}}_k$ sampled from a normal distribution and a Rademacher distribution vs FedAvg Algorithm.
  • Figure 3: Accuracy plot on the test dataset of Algorithm \ref{['alg2_new']} for -- vs FedAvg Algorithm.

Theorems & Definitions (11)

  • Lemma III.1: Unbiasedness of the projected directional derivative along a random vector $\boldsymbol{\mathbf{v}}$ rostami2024forward
  • Lemma III.2: Upper bound on the projected directional derivative along a random vector $\boldsymbol{\mathbf{v}}$ nesterov2017random
  • Theorem III.1: Convergence bound of Algorithm \ref{['alg2_new']} for non-convex loss functions
  • Definition 1: Rademacher distribution degroot2012probability
  • Proposition III.1: Reducing the variance by changing the distribution of $\boldsymbol{\mathbf{v}}_{k,n}$ in FedScalar algorithm
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • proof : Proof of Lemma \ref{['lem::unbias']}
  • ...and 1 more