Table of Contents
Fetching ...

FedCluster: Boosting the Convergence of Federated Learning via Cluster-Cycling

Cheng Chen, Ziyi Chen, Yi Zhou, Bhavya Kailkhura

TL;DR

It is shown that FedCluster with the devices implementing the local stochastic gradient descent (SGD) algorithm achieves a faster convergence rate than the conventional federated averaging (Fe) algorithm in the presence of device-level data heterogeneity.

Abstract

We develop FedCluster--a novel federated learning framework with improved optimization efficiency, and investigate its theoretical convergence properties. The FedCluster groups the devices into multiple clusters that perform federated learning cyclically in each learning round. Therefore, each learning round of FedCluster consists of multiple cycles of meta-update that boost the overall convergence. In nonconvex optimization, we show that FedCluster with the devices implementing the local {stochastic gradient descent (SGD)} algorithm achieves a faster convergence rate than the conventional {federated averaging (FedAvg)} algorithm in the presence of device-level data heterogeneity. We conduct experiments on deep learning applications and demonstrate that FedCluster converges significantly faster than the conventional federated learning under diverse levels of device-level data heterogeneity for a variety of local optimizers.

FedCluster: Boosting the Convergence of Federated Learning via Cluster-Cycling

TL;DR

It is shown that FedCluster with the devices implementing the local stochastic gradient descent (SGD) algorithm achieves a faster convergence rate than the conventional federated averaging (Fe) algorithm in the presence of device-level data heterogeneity.

Abstract

We develop FedCluster--a novel federated learning framework with improved optimization efficiency, and investigate its theoretical convergence properties. The FedCluster groups the devices into multiple clusters that perform federated learning cyclically in each learning round. Therefore, each learning round of FedCluster consists of multiple cycles of meta-update that boost the overall convergence. In nonconvex optimization, we show that FedCluster with the devices implementing the local {stochastic gradient descent (SGD)} algorithm achieves a faster convergence rate than the conventional {federated averaging (FedAvg)} algorithm in the presence of device-level data heterogeneity. We conduct experiments on deep learning applications and demonstrate that FedCluster converges significantly faster than the conventional federated learning under diverse levels of device-level data heterogeneity for a variety of local optimizers.

Paper Structure

This paper contains 16 sections, 3 theorems, 19 equations, 6 figures, 1 algorithm.

Key Result

Theorem 1

Let assum: P hold and assume that $f(\cdot;\xi)$ is nonconvex for any data sample $\xi$. Choose learning rate $\eta_{j,K,t}\equiv(TME)^{-\frac{1}{2}}$ and choose $E,M,T$ such that $ME \le \frac{C}{8LG^2}$, $T\ge L^2\max(1, \frac{16}{EM})$. Then, under full participation of the devices, the output o where the constant $C$ is defined as Furthermore, in order to achieve a solution that satisfies $\

Figures (6)

  • Figure 1: Left: Traditional federated learning system. Right: FedCluster system with cluster-cycling.
  • Figure 2: Comparison between FedCluster and FedAvg under different device-level data heterogeneities on CIFAR-10.
  • Figure 3: Comparison between FedCluster and FedAvg under different device-level data heterogeneities on MNIST.
  • Figure 4: Comparison between FedCluster and FedAvg under different local optimizers on CIFAR-10 (left) and MNIST (right).
  • Figure 5: Comparison between FedCluster and FedAvg under different number of clusters on CIFAR-10 (top) and MNIST (bottom).
  • ...and 1 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof