Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum
Yuan Zhou, Xinli Shi, Xuelong Li, Jiachen Zhong, Guanghui Wen, Jinde Cao
TL;DR
DEPOSITUM addresses decentralized nonconvex composite federated learning (DNCFL) by fusing proximal gradient tracking with momentum on a time-varying network, handling weakly convex regularizers and enabling local updates. The method achieves an expected $\epsilon$-stationary point with iteration complexity $\mathcal{O}(1/\epsilon^{2})$, while proximal-gradient, consensus, and gradient-estimation errors decay at rate $\mathcal{O}(1/T)$; with proper parameter choices, network-independent linear speedup is possible without mega-batches. Empirical results on neural networks with real-world datasets demonstrate robustness to data heterogeneity and favorable hyperparameter tradeoffs, outpacing server-based baselines. This work advances decentralized federated optimization for nonconvex composite objectives and offers a practical, communication-efficient training framework with provable guarantees.
Abstract
Decentralized Federated Learning (DFL) eliminates the reliance on the server-client architecture inherent in traditional federated learning, attracting significant research interest in recent years. Simultaneously, the objective functions in machine learning tasks are often nonconvex and frequently incorporate additional, potentially nonsmooth regularization terms to satisfy practical requirements, thereby forming nonconvex composite optimization problems. Employing DFL methods to solve such general optimization problems leads to the formulation of Decentralized Nonconvex Composite Federated Learning (DNCFL), a topic that remains largely underexplored. In this paper, we propose a novel DNCFL algorithm, termed \bf{DEPOSITUM}. Built upon proximal stochastic gradient tracking, DEPOSITUM mitigates the impact of data heterogeneity by enabling clients to approximate the global gradient. The introduction of momentums in the proximal gradient descent step, replacing tracking variables, reduces the variance introduced by stochastic gradients. Additionally, DEPOSITUM supports local updates of client variables, significantly reducing communication costs. Theoretical analysis demonstrates that DEPOSITUM achieves an expected $ε$-stationary point with an iteration complexity of $\mathcal{O}(1/ε^2)$. The proximal gradient, consensus errors, and gradient estimation errors decrease at a sublinear rate of $\mathcal{O}(1/T)$. With appropriate parameter selection, the algorithm achieves network-independent linear speedup without requiring mega-batch sampling. Finally, we apply DEPOSITUM to the training of neural networks on real-world datasets, systematically examining the influence of various hyperparameters on its performance. Comparisons with other federated composite optimization algorithms validate the effectiveness of the proposed method.
