Table of Contents
Fetching ...

Delayed Random Partial Gradient Averaging for Federated Learning

Xinyi Hu

TL;DR

Problem: Federated learning faces bandwidth and latency bottlenecks that hinder scalability across edge devices. Approach: DPGA reduces communication by sharing only a dynamic fraction of gradients $p^t$ guided by a random-walk model and overlaps local computation with communication using delayed aggregation, using per-component masking and Top-K sparsification to minimize data exchanged. The objective remains $f(w)=\sum_i \frac{n_i}{n} f_i(w)$ with FedAvg-style updates $w_g^{t+1}=\sum_{i\in \mathcal{S}_t} \beta_i^t w_i^t$, and experiments on non-IID CIFAR-10/100 show DPGA outperforms baselines in accuracy, communication cost, and run time. Significance: demonstrates practical, scalable FL on edge devices by simultaneously addressing bandwidth and latency bottlenecks.

Abstract

Federated learning (FL) is a distributed machine learning paradigm that enables multiple clients to train a shared model collaboratively while preserving privacy. However, the scaling of real-world FL systems is often limited by two communication bottlenecks:(a) while the increasing computing power of edge devices enables the deployment of large-scale Deep Neural Networks (DNNs), the limited bandwidth constraints frequent transmissions over large DNNs; and (b) high latency cost greatly degrades the performance of FL. In light of these bottlenecks, we propose a Delayed Random Partial Gradient Averaging (DPGA) to enhance FL. Under DPGA, clients only share partial local model gradients with the server. The size of the shared part in a local model is determined by the update rate, which is coarsely initialized and subsequently refined over the temporal dimension. Moreover, DPGA largely reduces the system run time by enabling computation in parallel with communication. We conduct experiments on non-IID CIFAR-10/100 to demonstrate the efficacy of our method.

Delayed Random Partial Gradient Averaging for Federated Learning

TL;DR

Problem: Federated learning faces bandwidth and latency bottlenecks that hinder scalability across edge devices. Approach: DPGA reduces communication by sharing only a dynamic fraction of gradients guided by a random-walk model and overlaps local computation with communication using delayed aggregation, using per-component masking and Top-K sparsification to minimize data exchanged. The objective remains with FedAvg-style updates , and experiments on non-IID CIFAR-10/100 show DPGA outperforms baselines in accuracy, communication cost, and run time. Significance: demonstrates practical, scalable FL on edge devices by simultaneously addressing bandwidth and latency bottlenecks.

Abstract

Federated learning (FL) is a distributed machine learning paradigm that enables multiple clients to train a shared model collaboratively while preserving privacy. However, the scaling of real-world FL systems is often limited by two communication bottlenecks:(a) while the increasing computing power of edge devices enables the deployment of large-scale Deep Neural Networks (DNNs), the limited bandwidth constraints frequent transmissions over large DNNs; and (b) high latency cost greatly degrades the performance of FL. In light of these bottlenecks, we propose a Delayed Random Partial Gradient Averaging (DPGA) to enhance FL. Under DPGA, clients only share partial local model gradients with the server. The size of the shared part in a local model is determined by the update rate, which is coarsely initialized and subsequently refined over the temporal dimension. Moreover, DPGA largely reduces the system run time by enabling computation in parallel with communication. We conduct experiments on non-IID CIFAR-10/100 to demonstrate the efficacy of our method.
Paper Structure (9 sections, 10 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 9 sections, 10 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: (a) Local computing and global updating are conducted sequentially with partial gradient passing. (b) Local computing is executed in parallel to global communication with full gradient exchange between the server and the clients. (c) Local computation is performed in parallel with global communication, and only partial gradient is exchanged.
  • Figure 2: A comparison of different methods with non-IID CIFAR-10/100 settings. LG-Fed is fine-tuned based on the FedAvg pre-trained model.