Table of Contents
Fetching ...

Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration

Qinglun Li, Miao Zhang, Yingqi Liu, Quanjun Yin, Li Shen, Xiaochun Cao

TL;DR

DFedCata tackles slow convergence and weak generalization in decentralized federated learning by marrying Moreau envelope smoothing with Nesterov extrapolation during aggregation. The approach yields a provable optimization rate and, for the first time in DFL, a stability-based generalization bound that highlights how hyperparameters (notably the Catalyst parameter $eta$) and topology influence performance. Empirically, it achieves up to 8.6x faster convergence and at least 3% improved generalization on CIFAR-10/100 across non-iid partitions, outperforming state-of-the-art baselines. The work provides practical guidance on hyperparameters, data partitioning, and topology to deploy DFedCata effectively in real-world decentralized learning systems.

Abstract

Decentralized Federated Learning has emerged as an alternative to centralized architectures due to its faster training, privacy preservation, and reduced communication overhead. In decentralized communication, the server aggregation phase in Centralized Federated Learning shifts to the client side, which means that clients connect with each other in a peer-to-peer manner. However, compared to the centralized mode, data heterogeneity in Decentralized Federated Learning will cause larger variances between aggregated models, which leads to slow convergence in training and poor generalization performance in tests. To address these issues, we introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. It consists of two main components: the Moreau envelope function, which primarily addresses parameter inconsistencies among clients caused by data heterogeneity, and Nesterov's extrapolation step, which accelerates the aggregation phase. Theoretically, we prove the optimization error bound and generalization error bound of the algorithm, providing a further understanding of the nature of the algorithm and the theoretical perspectives on the hyperparameter choice. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions. Furthermore, we also experimentally verify the theoretical properties of DFedCata.

Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration

TL;DR

DFedCata tackles slow convergence and weak generalization in decentralized federated learning by marrying Moreau envelope smoothing with Nesterov extrapolation during aggregation. The approach yields a provable optimization rate and, for the first time in DFL, a stability-based generalization bound that highlights how hyperparameters (notably the Catalyst parameter ) and topology influence performance. Empirically, it achieves up to 8.6x faster convergence and at least 3% improved generalization on CIFAR-10/100 across non-iid partitions, outperforming state-of-the-art baselines. The work provides practical guidance on hyperparameters, data partitioning, and topology to deploy DFedCata effectively in real-world decentralized learning systems.

Abstract

Decentralized Federated Learning has emerged as an alternative to centralized architectures due to its faster training, privacy preservation, and reduced communication overhead. In decentralized communication, the server aggregation phase in Centralized Federated Learning shifts to the client side, which means that clients connect with each other in a peer-to-peer manner. However, compared to the centralized mode, data heterogeneity in Decentralized Federated Learning will cause larger variances between aggregated models, which leads to slow convergence in training and poor generalization performance in tests. To address these issues, we introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. It consists of two main components: the Moreau envelope function, which primarily addresses parameter inconsistencies among clients caused by data heterogeneity, and Nesterov's extrapolation step, which accelerates the aggregation phase. Theoretically, we prove the optimization error bound and generalization error bound of the algorithm, providing a further understanding of the nature of the algorithm and the theoretical perspectives on the hyperparameter choice. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions. Furthermore, we also experimentally verify the theoretical properties of DFedCata.

Paper Structure

This paper contains 31 sections, 18 theorems, 85 equations, 5 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumption as:smoothness - as:bounded_heterogeneity, let $\eta < \min\{\frac{(1-\beta)^{\frac{3}{2}}}{2KL\sqrt{2L}}, \frac{(1-\beta)^2}{KL}, \frac{\sqrt{1-\beta}}{2KL\sqrt{(1+B^2)}}\}$,$\widetilde{\eta} = \frac{\gamma}{\lambda(1-\beta)}$, Where $\gamma = 1-(1-\eta\lambda)^K$, it is obvious tha where $\kappa > \frac{1}{2}$ is a constant and $\mu = 1 - \frac{4L^2}{1-\beta}\frac{\gamma^2}{\lamb

Figures (5)

  • Figure 1: The optimization process diagrams for two clients under the DFedAvg and DFedCata algorithms are simulated. The primary improvements include two aspects. Firstly, the Moreau envelope reduces the inconsistency between clients (DFedCata has a shorter length of black dashed line). Secondly, the Nesterov acceleration during the aggregation stage significantly brings $\bar{\mathbf{x}}$ closer to the optimal value $\mathbf{x}^*$.
  • Figure 2: Test accuracy of all baselines on CIFAR-10 in both IID and different non-IID settings.
  • Figure 3: Test accuracy of all baselines on CIFAR-100 in both IID and different non-IID settings.
  • Figure 4: Accuracy of different DFL algorithms with different decentralized topologies on the test dataset.
  • Figure 5: Hyperparameter Sensitivity: local epochs $K$, Catalyst parameter $\beta$, number of participated clients $m$, penalty parameter $\lambda$.

Theorems & Definitions (38)

  • Definition 1
  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Lemma 1
  • Proof 1
  • Lemma 2
  • Proof 2
  • Lemma 3
  • ...and 28 more