Table of Contents
Fetching ...

Decentralized Conjugate Gradient and Memoryless BFGS Methods

Liping Wang, Hao Wu, Hongchao Zhang

TL;DR

This work addresses decentralized optimization over a connected network by introducing two methods: NDCG, a gradient-tracking enhanced decentralized conjugate gradient designed for nonconvex objectives, and DMBFGS, a memoryless BFGS variant for strongly convex objectives. NDCG achieves global convergence with a constant stepsize by integrating gradient tracking and a PRP-type conjugate parameter, while DMBFGS yields global linear convergence through adaptive self-scaling quasi-Newton updates based solely on gradient information. The authors prove convergence results under mild assumptions, including Lipschitz gradients and strong convexity, and analyze rates in terms of problem and network condition numbers. Numerical experiments on nonconvex logistic regression and strongly convex linear/logistic regression demonstrate that NDCG and DMBFGS outperform state-of-the-art decentralized first-order and quasi-Newton methods in both iteration and communication efficiency, highlighting practical impact for scalable distributed optimization.

Abstract

This paper proposes a new decentralized conjugate gradient (NDCG) method and a decentralized memoryless BFGS (DMBFGS) method for the nonconvex and strongly convex decentralized optimization problem, respectively, of minimizing a finite sum of continuously differentiable functions over a fixed-connected undirected network. Gradient tracking techniques are applied in these two methods to enhance their convergence properties and the numerical stability. In particular, we show global convergence of NDCG with constant stepsize for general nonconvex smooth decentralized optimization. Our new DMBFGS method uses a scaled memoryless BFGS technique and only requires gradient information to approximate second-order information of the component functions in the objective. We also establish global convergence and linear convergence rate of DMBFGS with constant stepsize for strongly convex smooth decentralized optimization. Our numerical results show that NDCG and DMBFGS are very efficient in terms of both iteration and communication cost compared with other state-of-the-art methods for solving smooth decentralized optimization.

Decentralized Conjugate Gradient and Memoryless BFGS Methods

TL;DR

This work addresses decentralized optimization over a connected network by introducing two methods: NDCG, a gradient-tracking enhanced decentralized conjugate gradient designed for nonconvex objectives, and DMBFGS, a memoryless BFGS variant for strongly convex objectives. NDCG achieves global convergence with a constant stepsize by integrating gradient tracking and a PRP-type conjugate parameter, while DMBFGS yields global linear convergence through adaptive self-scaling quasi-Newton updates based solely on gradient information. The authors prove convergence results under mild assumptions, including Lipschitz gradients and strong convexity, and analyze rates in terms of problem and network condition numbers. Numerical experiments on nonconvex logistic regression and strongly convex linear/logistic regression demonstrate that NDCG and DMBFGS outperform state-of-the-art decentralized first-order and quasi-Newton methods in both iteration and communication efficiency, highlighting practical impact for scalable distributed optimization.

Abstract

This paper proposes a new decentralized conjugate gradient (NDCG) method and a decentralized memoryless BFGS (DMBFGS) method for the nonconvex and strongly convex decentralized optimization problem, respectively, of minimizing a finite sum of continuously differentiable functions over a fixed-connected undirected network. Gradient tracking techniques are applied in these two methods to enhance their convergence properties and the numerical stability. In particular, we show global convergence of NDCG with constant stepsize for general nonconvex smooth decentralized optimization. Our new DMBFGS method uses a scaled memoryless BFGS technique and only requires gradient information to approximate second-order information of the component functions in the objective. We also establish global convergence and linear convergence rate of DMBFGS with constant stepsize for strongly convex smooth decentralized optimization. Our numerical results show that NDCG and DMBFGS are very efficient in terms of both iteration and communication cost compared with other state-of-the-art methods for solving smooth decentralized optimization.
Paper Structure (14 sections, 14 theorems, 98 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 14 sections, 14 theorems, 98 equations, 8 figures, 1 table, 2 algorithms.

Key Result

Lemma 2.1

\newlabelproperty W For $\tilde{{\bf{W}}}$ defined in Definition mix and ${\bf{W}} :=\tilde{{\bf{W}}}\otimes{\bf{I}}_p$, we have

Figures (8)

  • Figure 2.1: Relative error and consensus error of SDCG using $\beta_{i}^{t,PRP}$ versus iterations for stepsizes 0.1, 0.01, 0.001, and 0.0001. Relative error is given by \ref{['rel_error']} and consensus error is defined as $\|{\bf{x}}^t-{\bf{M}}{\bf{x}}^t\|$.
  • Figure 2.2: $\frac{1}{n} \sum_i^n |\beta_{i}^t|$ generated by SDCG versus iterations for $\beta_{i}^t=\beta_{i}^{t,PRP}$.
  • Figure 3.1: Optimality error of comparison algorithms for minimizing the nonconvex logistic regression problem \ref{['noncovex_logistic_problem']} on different datasets.
  • Figure 3.2: Comparisons with gradient-based algorithms for minimizing the strongly convex linear regression problem \ref{['linear_problem']} with different condition numbers.
  • Figure 3.3: Comparisons with gradient-based algorithms for minimizing the strongly convex linear regression problem \ref{['linear_problem']} with different condition numbers.
  • ...and 3 more figures

Theorems & Definitions (29)

  • Definition 1
  • Lemma 2.1
  • Lemma 2.2
  • Lemma 2.3
  • Lemma 2.4
  • Remark 2.1
  • Remark 2.2
  • Lemma 2.5
  • Lemma 2.6
  • proof
  • ...and 19 more