A Two-timescale Primal-dual Algorithm for Decentralized Optimization with Compression
Haoming Liu, Chung-Yiu Yau, Hoi-To Wai
TL;DR
This work addresses decentralized optimization under communication constraints by introducing TiCoPD, a two-timescale primal-dual algorithm that supports nonlinear compression through a majorization-minimization surrogate. By decoupling communication from optimization via a compressed surrogate $\hat{\mathbf{X}}^t$ and employing a contractive compressor, TiCoPD achieves convergence with a constant stepsize and an $O(1/T)$ stationary-point rate without assuming bounded gradient heterogeneity. The main contributions include the MM-based surrogate, the two-timescale update, and a convergence guarantee under standard smoothness and compression assumptions, validated on neural-network training over a network. Overall, the method reduces communication overhead in distributed learning while broadening the applicability of compression-enabled decentralized optimization.
Abstract
This paper proposes a two-timescale compressed primal-dual (TiCoPD) algorithm for decentralized optimization with improved communication efficiency over prior works on primal-dual decentralized optimization. The algorithm is built upon the primal-dual optimization framework and utilizes a majorization-minimization procedure. The latter naturally suggests the agents to share a compressed difference term during the iteration. Furthermore, the TiCoPD algorithm incorporates a fast timescale mirror sequence for agent consensus on nonlinearly compressed terms, together with a slow timescale primal-dual recursion for optimizing the objective function. We show that the TiCoPD algorithm converges with a constant step size. It also finds an O(1 /T ) stationary solution after T iterations. Numerical experiments on decentralized training of a neural network validate the efficacy of TiCoPD algorithm.
