Table of Contents
Fetching ...

Decentralized Sum-of-Nonconvex Optimization

Zhuanghua Liu, Bryan Kian Hsiang Low

TL;DR

This work studies decentralized optimization for a sum-of-nonconvex objective F with F(x) = f(x) + ψ(x) and f(x) = (1/m)∑_i f_i(x), where each f_i is a sum of potentially nonconvex components f_{i,j}. It introduces PMGT-SVRG with a new linear-convergence analysis and then proposes PMGT-KatyushaX, an accelerated decentralized method that combines KatyushaX-style acceleration, gradient tracking, and multi-consensus mixing. Theoretical results show linear convergence for PMGT-SVRG and a sqrt(kappa) dependence for the accelerated PMGT-KatyushaX, accompanied by concrete SFO and communication complexities; empirical tests on synthetic and real data validate the improvements. This work highlights promising computation–communication trade-offs for ill-conditioned distributed optimization and points to future work on further reducing communication overhead in decentralized settings.

Abstract

We consider the optimization problem of minimizing the sum-of-nonconvex function, i.e., a convex function that is the average of nonconvex components. The existing stochastic algorithms for such a problem only focus on a single machine and the centralized scenario. In this paper, we study the sum-of-nonconvex optimization in the decentralized setting. We present a new theoretical analysis of the PMGT-SVRG algorithm for this problem and prove the linear convergence of their approach. However, the convergence rate of the PMGT-SVRG algorithm has a linear dependency on the condition number, which is undesirable for the ill-conditioned problem. To remedy this issue, we propose an accelerated stochastic decentralized first-order algorithm by incorporating the techniques of acceleration, gradient tracking, and multi-consensus mixing into the SVRG algorithm. The convergence rate of the proposed method has a square-root dependency on the condition number. The numerical experiments validate the theoretical guarantee of our proposed algorithms on both synthetic and real-world datasets.

Decentralized Sum-of-Nonconvex Optimization

TL;DR

This work studies decentralized optimization for a sum-of-nonconvex objective F with F(x) = f(x) + ψ(x) and f(x) = (1/m)∑_i f_i(x), where each f_i is a sum of potentially nonconvex components f_{i,j}. It introduces PMGT-SVRG with a new linear-convergence analysis and then proposes PMGT-KatyushaX, an accelerated decentralized method that combines KatyushaX-style acceleration, gradient tracking, and multi-consensus mixing. Theoretical results show linear convergence for PMGT-SVRG and a sqrt(kappa) dependence for the accelerated PMGT-KatyushaX, accompanied by concrete SFO and communication complexities; empirical tests on synthetic and real data validate the improvements. This work highlights promising computation–communication trade-offs for ill-conditioned distributed optimization and points to future work on further reducing communication overhead in decentralized settings.

Abstract

We consider the optimization problem of minimizing the sum-of-nonconvex function, i.e., a convex function that is the average of nonconvex components. The existing stochastic algorithms for such a problem only focus on a single machine and the centralized scenario. In this paper, we study the sum-of-nonconvex optimization in the decentralized setting. We present a new theoretical analysis of the PMGT-SVRG algorithm for this problem and prove the linear convergence of their approach. However, the convergence rate of the PMGT-SVRG algorithm has a linear dependency on the condition number, which is undesirable for the ill-conditioned problem. To remedy this issue, we propose an accelerated stochastic decentralized first-order algorithm by incorporating the techniques of acceleration, gradient tracking, and multi-consensus mixing into the SVRG algorithm. The convergence rate of the proposed method has a square-root dependency on the condition number. The numerical experiments validate the theoretical guarantee of our proposed algorithms on both synthetic and real-world datasets.
Paper Structure (33 sections, 29 theorems, 148 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 33 sections, 29 theorems, 148 equations, 2 figures, 1 table, 3 algorithms.

Key Result

Theorem 4.1

Assume function $F(\cdot)$ defined in (obj) is $\sigma$-strongly convex, $f(\cdot)$ is $L$-smooth, and each component $f_{i,j}$ is $(\ell_1, \ell_2)$-smooth. Additionally, we assume that the underlying network matrix $W$ is doubly stochastic so it satisfies the properties in Definition doubly_stocha SFO calls and rounds of communication.

Figures (2)

  • Figure 1: Performance comparison between PGEXTRA, NIDS, PMGT-SVRG, and PMGT-KatyushaX on the synthetic dataset. The left column represents results with the ratio $r=2$ and the right column represents results with the ratio $r=300$ defined in Problem (\ref{['exp_obj']}). The plot of PGEXTRA and NIDS are overlapped as their performance are close to each other.
  • Figure 2: Performance comparison between PGEXTRA, NIDS, PMGT-SVRG, and PMGT-KatyushaX on the Covtype dataset. The left column represents results with the ratio $r=2$ and the right column represents results with the ratio $r=300$ defined in (\ref{['exp_obj']}). The plot of PGEXTRA and NIDS are overlapped as their performance are close to each other.

Theorems & Definitions (50)

  • Definition 3.1
  • Definition 3.2
  • Theorem 4.1
  • Remark 4.2
  • Theorem 5.1
  • Remark 5.2
  • Corollary 5.3
  • Lemma 6.1: liu2011accelerated
  • Lemma 6.2
  • Lemma 6.3
  • ...and 40 more