Table of Contents
Fetching ...

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum

Yuan Zhou, Xinli Shi, Xuelong Li, Jiachen Zhong, Guanghui Wen, Jinde Cao

TL;DR

DEPOSITUM addresses decentralized nonconvex composite federated learning (DNCFL) by fusing proximal gradient tracking with momentum on a time-varying network, handling weakly convex regularizers and enabling local updates. The method achieves an expected $\epsilon$-stationary point with iteration complexity $\mathcal{O}(1/\epsilon^{2})$, while proximal-gradient, consensus, and gradient-estimation errors decay at rate $\mathcal{O}(1/T)$; with proper parameter choices, network-independent linear speedup is possible without mega-batches. Empirical results on neural networks with real-world datasets demonstrate robustness to data heterogeneity and favorable hyperparameter tradeoffs, outpacing server-based baselines. This work advances decentralized federated optimization for nonconvex composite objectives and offers a practical, communication-efficient training framework with provable guarantees.

Abstract

Decentralized Federated Learning (DFL) eliminates the reliance on the server-client architecture inherent in traditional federated learning, attracting significant research interest in recent years. Simultaneously, the objective functions in machine learning tasks are often nonconvex and frequently incorporate additional, potentially nonsmooth regularization terms to satisfy practical requirements, thereby forming nonconvex composite optimization problems. Employing DFL methods to solve such general optimization problems leads to the formulation of Decentralized Nonconvex Composite Federated Learning (DNCFL), a topic that remains largely underexplored. In this paper, we propose a novel DNCFL algorithm, termed \bf{DEPOSITUM}. Built upon proximal stochastic gradient tracking, DEPOSITUM mitigates the impact of data heterogeneity by enabling clients to approximate the global gradient. The introduction of momentums in the proximal gradient descent step, replacing tracking variables, reduces the variance introduced by stochastic gradients. Additionally, DEPOSITUM supports local updates of client variables, significantly reducing communication costs. Theoretical analysis demonstrates that DEPOSITUM achieves an expected $ε$-stationary point with an iteration complexity of $\mathcal{O}(1/ε^2)$. The proximal gradient, consensus errors, and gradient estimation errors decrease at a sublinear rate of $\mathcal{O}(1/T)$. With appropriate parameter selection, the algorithm achieves network-independent linear speedup without requiring mega-batch sampling. Finally, we apply DEPOSITUM to the training of neural networks on real-world datasets, systematically examining the influence of various hyperparameters on its performance. Comparisons with other federated composite optimization algorithms validate the effectiveness of the proposed method.

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum

TL;DR

DEPOSITUM addresses decentralized nonconvex composite federated learning (DNCFL) by fusing proximal gradient tracking with momentum on a time-varying network, handling weakly convex regularizers and enabling local updates. The method achieves an expected -stationary point with iteration complexity , while proximal-gradient, consensus, and gradient-estimation errors decay at rate ; with proper parameter choices, network-independent linear speedup is possible without mega-batches. Empirical results on neural networks with real-world datasets demonstrate robustness to data heterogeneity and favorable hyperparameter tradeoffs, outpacing server-based baselines. This work advances decentralized federated optimization for nonconvex composite objectives and offers a practical, communication-efficient training framework with provable guarantees.

Abstract

Decentralized Federated Learning (DFL) eliminates the reliance on the server-client architecture inherent in traditional federated learning, attracting significant research interest in recent years. Simultaneously, the objective functions in machine learning tasks are often nonconvex and frequently incorporate additional, potentially nonsmooth regularization terms to satisfy practical requirements, thereby forming nonconvex composite optimization problems. Employing DFL methods to solve such general optimization problems leads to the formulation of Decentralized Nonconvex Composite Federated Learning (DNCFL), a topic that remains largely underexplored. In this paper, we propose a novel DNCFL algorithm, termed \bf{DEPOSITUM}. Built upon proximal stochastic gradient tracking, DEPOSITUM mitigates the impact of data heterogeneity by enabling clients to approximate the global gradient. The introduction of momentums in the proximal gradient descent step, replacing tracking variables, reduces the variance introduced by stochastic gradients. Additionally, DEPOSITUM supports local updates of client variables, significantly reducing communication costs. Theoretical analysis demonstrates that DEPOSITUM achieves an expected -stationary point with an iteration complexity of . The proximal gradient, consensus errors, and gradient estimation errors decrease at a sublinear rate of . With appropriate parameter selection, the algorithm achieves network-independent linear speedup without requiring mega-batch sampling. Finally, we apply DEPOSITUM to the training of neural networks on real-world datasets, systematically examining the influence of various hyperparameters on its performance. Comparisons with other federated composite optimization algorithms validate the effectiveness of the proposed method.

Paper Structure

This paper contains 40 sections, 10 theorems, 115 equations, 7 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

Suppose that Assumptions ass:AssumptionFunction, ass:AssumptionMixingMatrix and ass:AssumptionStochastic hold, if $0<\alpha\leqslant\min\{\frac{1}{16L}, \frac{1}{48\rho}\}$, it has the following inequality:

Figures (7)

  • Figure 1: Network topologies with 10 clients.
  • Figure 2: The proportions of samples corresponding to 10 classes of labels distributed across 10 clients under different Dirichlet distributions. The $x$-axis of this bar chart represents each client, while the $y$-axis shows the proportion of samples from each class assigned to that client. The colors indicate different labels, and the total proportion for each class across clients sums to 1. With $Dir(0.1)$ (b), the distributions are more uneven across clients, indicating greater data heterogeneity, whereas $Dir(1)$ (a) produces more balanced distributions. In the case of uniformly distributed (IID) local data, each bar would reach a height of 1, evenly divided into 10 colors.
  • Figure 3: Effect of step size parameters $\alpha$ and $\beta$ on DEPOSITUM. In this figure, the $x$-axis represents the number of iterations $t$, while the $y$-axis denotes the loss function (a), gradient estimation errors (b), proximal gradient (c), and the consensus errors of $\mathbf{x}$ (d), $\mathbf{y}$ (e), and $\bm{\nu}$ (f), respectively.
  • Figure 4: Effect of momentum parameter $\gamma$ on DEPOSITUM. In this figure, the $x$-axis represents the number of iterations $t$, while the $y$-axis shows the training loss (a) and test accuracy (b) of the model after several iterations.
  • Figure 5: Effect of communication period $T_0$ on DEPOSITUM. In this figure, the $x$-axis represents the number of communications/iterations $t$, while the $y$-axis shows the training loss (a), test accuracy (b) and the consensus errors of $\mathbf{x}$ (c) in the training.
  • ...and 2 more figures

Theorems & Definitions (17)

  • Definition 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Definition 2
  • Definition 3
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • ...and 7 more