Table of Contents
Fetching ...

FedLion: Faster Adaptive Federated Optimization with Fewer Communication

Zhiwei Tang, Tsung-Hui Chang

TL;DR

FedLion addresses slow convergence and high communication costs in federated learning by adapting the centralized Lion optimizer to FL with periodic averaging. It achieves faster convergence than prior adaptive FL methods and reduces uplink data through sign-based local updates, underpinned by a new bounded heterogeneity assumption and a nonasymptotic convergence rate of $O(T^{-1/2})$ in suitable settings. Empirically, on EMNIST and CIFAR-10, FedLion outperforms FAFED and FedDA while incurring only marginal uplink overhead, with especially strong gains when gradients are dense. The approach offers practical communication savings and accelerated training in large-scale FL deployments, particularly under moderate heterogeneity and high gradient density.

Abstract

In Federated Learning (FL), a framework to train machine learning models across distributed data, well-known algorithms like FedAvg tend to have slow convergence rates, resulting in high communication costs during training. To address this challenge, we introduce FedLion, an adaptive federated optimization algorithm that seamlessly incorporates key elements from the recently proposed centralized adaptive algorithm, Lion (Chen et al. 2o23), into the FL framework. Through comprehensive evaluations on two widely adopted FL benchmarks, we demonstrate that FedLion outperforms previous state-of-the-art adaptive algorithms, including FAFED (Wu et al. 2023) and FedDA. Moreover, thanks to the use of signed gradients in local training, FedLion substantially reduces data transmission requirements during uplink communication when compared to existing adaptive algorithms, further reducing communication costs. Last but not least, this work also includes a novel theoretical analysis, showcasing that FedLion attains faster convergence rate than established FL algorithms like FedAvg.

FedLion: Faster Adaptive Federated Optimization with Fewer Communication

TL;DR

FedLion addresses slow convergence and high communication costs in federated learning by adapting the centralized Lion optimizer to FL with periodic averaging. It achieves faster convergence than prior adaptive FL methods and reduces uplink data through sign-based local updates, underpinned by a new bounded heterogeneity assumption and a nonasymptotic convergence rate of in suitable settings. Empirically, on EMNIST and CIFAR-10, FedLion outperforms FAFED and FedDA while incurring only marginal uplink overhead, with especially strong gains when gradients are dense. The approach offers practical communication savings and accelerated training in large-scale FL deployments, particularly under moderate heterogeneity and high gradient density.

Abstract

In Federated Learning (FL), a framework to train machine learning models across distributed data, well-known algorithms like FedAvg tend to have slow convergence rates, resulting in high communication costs during training. To address this challenge, we introduce FedLion, an adaptive federated optimization algorithm that seamlessly incorporates key elements from the recently proposed centralized adaptive algorithm, Lion (Chen et al. 2o23), into the FL framework. Through comprehensive evaluations on two widely adopted FL benchmarks, we demonstrate that FedLion outperforms previous state-of-the-art adaptive algorithms, including FAFED (Wu et al. 2023) and FedDA. Moreover, thanks to the use of signed gradients in local training, FedLion substantially reduces data transmission requirements during uplink communication when compared to existing adaptive algorithms, further reducing communication costs. Last but not least, this work also includes a novel theoretical analysis, showcasing that FedLion attains faster convergence rate than established FL algorithms like FedAvg.
Paper Structure (13 sections, 1 theorem, 5 equations, 3 figures, 1 algorithm)

This paper contains 13 sections, 1 theorem, 5 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

Suppose that A.1-A.4 in Assumption asp:common hold. Denote that $\bar{x}_{t,s} = \frac{1}{N}\sum_{i=1}^N x_{t,s}^i$, $\bar{L} = \sum_{j=1}^d L_j$ and $\bar{\sigma} = \sum_{j=1}^d \sigma_j$. If we set $\gamma=\frac{1}{\sqrt{T}}$, $\beta_1 = 1-\frac{1}{\sqrt{T}}$ and $\beta_2=1-\frac{1}{{T}}$ for Algo

Figures (3)

  • Figure 1: Experimental results on the EMNIST and CIFAR-10 datasets. The initial two rows represent the outcomes obtained from the EMNIST, while the subsequent rows are the results for CIFAR-10. The odd rows illustrate the training loss curves, while even rows depict the curves for test accuracy. For all the figures, the $x$-axis denotes the communication rounds. The columns, arranged from left to right, display the results obtained with varying values of $E=\{5, 10, 20\}$.
  • Figure 2: Gradient density ${\|\tilde{v}_{t,s}\|_1}/{\|\tilde{v}_{t,s}\|_2}$ during training. Left is EMNIST and right is CIFAR-10. The curve is obtained by running FedLion with $E=5$.
  • Figure 3: Empiricial distribution of the element values in $\Delta_{t-1}^i$ during training. The first rows is on EMNIST dataset while the second rows is on CIFAR-10 dataset. From left tor right are the setting with $E\in\{5,10,20\}$ respectively.

Theorems & Definitions (1)

  • Theorem 1