Table of Contents
Fetching ...

OledFL: Unleashing the Potential of Decentralized Federated Learning via Opposite Lookahead Enhancement

Qinglun Li, Miao Zhang, Mengzhu Wang, Quanjun Yin, Li Shen

TL;DR

This paper enhances the consistency of DFL by developing an opposite lookahead enhancement technique (Ole), yielding OledFL to optimize the initialization of each client in each communication round, thus significantly improving both the generalization and convergence speed.

Abstract

Decentralized Federated Learning (DFL) surpasses Centralized Federated Learning (CFL) in terms of faster training, privacy preservation, and light communication, making it a promising alternative in the field of federated learning. However, DFL still exhibits significant disparities with CFL in terms of generalization ability such as rarely theoretical understanding and degraded empirical performance due to severe inconsistency. In this paper, we enhance the consistency of DFL by developing an opposite lookahead enhancement technique (Ole), yielding OledFL to optimize the initialization of each client in each communication round, thus significantly improving both the generalization and convergence speed. Moreover, we rigorously establish its convergence rate in non-convex setting and characterize its generalization bound through uniform stability, which provides concrete reasons why OledFL can achieve both the fast convergence speed and high generalization ability. Extensive experiments conducted on the CIFAR10 and CIFAR100 datasets with Dirichlet and Pathological distributions illustrate that our OledFL can achieve up to 5\% performance improvement and 8$\times$ speedup, compared to the most popular DFedAvg optimizer in DFL.

OledFL: Unleashing the Potential of Decentralized Federated Learning via Opposite Lookahead Enhancement

TL;DR

This paper enhances the consistency of DFL by developing an opposite lookahead enhancement technique (Ole), yielding OledFL to optimize the initialization of each client in each communication round, thus significantly improving both the generalization and convergence speed.

Abstract

Decentralized Federated Learning (DFL) surpasses Centralized Federated Learning (CFL) in terms of faster training, privacy preservation, and light communication, making it a promising alternative in the field of federated learning. However, DFL still exhibits significant disparities with CFL in terms of generalization ability such as rarely theoretical understanding and degraded empirical performance due to severe inconsistency. In this paper, we enhance the consistency of DFL by developing an opposite lookahead enhancement technique (Ole), yielding OledFL to optimize the initialization of each client in each communication round, thus significantly improving both the generalization and convergence speed. Moreover, we rigorously establish its convergence rate in non-convex setting and characterize its generalization bound through uniform stability, which provides concrete reasons why OledFL can achieve both the fast convergence speed and high generalization ability. Extensive experiments conducted on the CIFAR10 and CIFAR100 datasets with Dirichlet and Pathological distributions illustrate that our OledFL can achieve up to 5\% performance improvement and 8 speedup, compared to the most popular DFedAvg optimizer in DFL.
Paper Structure (26 sections, 13 theorems, 86 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 13 theorems, 86 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumption as:smoothness - as:bounded_heterogeneity, let the learning rate satisfy $\eta \leq \frac{1}{K^{3/2}LB}$ where $K \geq 2$, let the Ole parameter $\beta \leq \min\{\frac{\sqrt{10}(1-\psi)}{40}, \frac{\sqrt{5}}{30} \}$, and after training $T$ rounds, the averaged model parameters gene where $\kappa \in (0,1)$ is a constant and $\alpha(\eta,K,L,\psi) = \frac{9}{\kappa}\eta^2 K^2 L^2\

Figures (8)

  • Figure 1: Simulate the optimization process diagrams for two clients under the DFedAvg and OledFL algorithms. Due to the presence of $\mathbf{x}_i^t - \mathbf{x}_{i,K}^{t-1}$ (Ole), where $\mathbf{x}_{i,K}^{t-1} \approx \mathbf{x}_{i}^*$, then the Ole initial point in OledFL represents taking a step back along the direction from the optimize starting point to the local optimum of the client. Furthermore, from the length of the dashed lines in the figure, it is evident that Ole significantly reduces the inconsistency during the optimization process.
  • Figure 2: Test accuracy of all baselines on CIFAR-10 in both IID and different non-IID settings.
  • Figure 3: Test accuracy of all baselines on CIFAR-100 in both IID and different non-IID settings.
  • Figure 4: The comparison of loss landscapes between DFedSAM and OledFL-SAM. Whereas the wireframe represents the loss landscape of OledFL-SAM, the surface represents the loss landscape of OledFL. It is clear that OledFL-SAM can find smoother minima.
  • Figure 5: (a) and (b) depict the comparison of loss landscapes between DFedSAM and OledFL-SAM, while (c) and (d) show the contour plots of the loss landscapes of DFedSAM and OledFL-SAM. From (c) and (d), it can be observed that OledFL-SAM optimizes deeper than DFedSAM. As shown in Figure \ref{['fig:comprision']}, a comparison in (a) and (b) of Figure \ref{['fig:losslandscape']} indicates that OledFL-SAM is able to find a flatter loss surface.
  • ...and 3 more figures

Theorems & Definitions (28)

  • Definition 1
  • Theorem 1
  • Remark 1
  • Remark 2
  • Definition 2
  • Theorem 2
  • Remark 3
  • Remark 4
  • Lemma 1
  • Proof 1
  • ...and 18 more