Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration
Qinglun Li, Miao Zhang, Yingqi Liu, Quanjun Yin, Li Shen, Xiaochun Cao
TL;DR
DFedCata tackles slow convergence and weak generalization in decentralized federated learning by marrying Moreau envelope smoothing with Nesterov extrapolation during aggregation. The approach yields a provable optimization rate and, for the first time in DFL, a stability-based generalization bound that highlights how hyperparameters (notably the Catalyst parameter $eta$) and topology influence performance. Empirically, it achieves up to 8.6x faster convergence and at least 3% improved generalization on CIFAR-10/100 across non-iid partitions, outperforming state-of-the-art baselines. The work provides practical guidance on hyperparameters, data partitioning, and topology to deploy DFedCata effectively in real-world decentralized learning systems.
Abstract
Decentralized Federated Learning has emerged as an alternative to centralized architectures due to its faster training, privacy preservation, and reduced communication overhead. In decentralized communication, the server aggregation phase in Centralized Federated Learning shifts to the client side, which means that clients connect with each other in a peer-to-peer manner. However, compared to the centralized mode, data heterogeneity in Decentralized Federated Learning will cause larger variances between aggregated models, which leads to slow convergence in training and poor generalization performance in tests. To address these issues, we introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. It consists of two main components: the Moreau envelope function, which primarily addresses parameter inconsistencies among clients caused by data heterogeneity, and Nesterov's extrapolation step, which accelerates the aggregation phase. Theoretically, we prove the optimization error bound and generalization error bound of the algorithm, providing a further understanding of the nature of the algorithm and the theoretical perspectives on the hyperparameter choice. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions. Furthermore, we also experimentally verify the theoretical properties of DFedCata.
