Decentralized Sum-of-Nonconvex Optimization
Zhuanghua Liu, Bryan Kian Hsiang Low
TL;DR
This work studies decentralized optimization for a sum-of-nonconvex objective F with F(x) = f(x) + ψ(x) and f(x) = (1/m)∑_i f_i(x), where each f_i is a sum of potentially nonconvex components f_{i,j}. It introduces PMGT-SVRG with a new linear-convergence analysis and then proposes PMGT-KatyushaX, an accelerated decentralized method that combines KatyushaX-style acceleration, gradient tracking, and multi-consensus mixing. Theoretical results show linear convergence for PMGT-SVRG and a sqrt(kappa) dependence for the accelerated PMGT-KatyushaX, accompanied by concrete SFO and communication complexities; empirical tests on synthetic and real data validate the improvements. This work highlights promising computation–communication trade-offs for ill-conditioned distributed optimization and points to future work on further reducing communication overhead in decentralized settings.
Abstract
We consider the optimization problem of minimizing the sum-of-nonconvex function, i.e., a convex function that is the average of nonconvex components. The existing stochastic algorithms for such a problem only focus on a single machine and the centralized scenario. In this paper, we study the sum-of-nonconvex optimization in the decentralized setting. We present a new theoretical analysis of the PMGT-SVRG algorithm for this problem and prove the linear convergence of their approach. However, the convergence rate of the PMGT-SVRG algorithm has a linear dependency on the condition number, which is undesirable for the ill-conditioned problem. To remedy this issue, we propose an accelerated stochastic decentralized first-order algorithm by incorporating the techniques of acceleration, gradient tracking, and multi-consensus mixing into the SVRG algorithm. The convergence rate of the proposed method has a square-root dependency on the condition number. The numerical experiments validate the theoretical guarantee of our proposed algorithms on both synthetic and real-world datasets.
