A Stochastic Approximation Approach for Efficient Decentralized Optimization on Random Networks
Chung-Yiu Yau, Haoming Liu, Hoi-To Wai
TL;DR
This paper tackles decentralized optimization over time-varying random networks with unreliable and bandwidth-constrained communications. It introduces the Fully Stochastic Primal Dual Algorithm (FSPDA), which uses a stochastic augmented Lagrangian to incorporate topology randomness and seeks its saddle points via stochastic approximation. The authors develop two variants, FSPDA-SA and FSPDA-STORM, achieving rates of $O(1/\sqrt{T})$ and $O(1/T^{2/3})$ for smooth (possibly non-convex) objectives, with PL-condition enabling linear convergence; both support sparsified communication and asynchronous operation. Empirical results on MNIST and Imagenet demonstrate improved iteration and communication efficiency compared to baselines, validating the framework’s robustness to topology randomness and its practical utility for large-scale distributed learning.
Abstract
A challenging problem in decentralized optimization is to develop algorithms with fast convergence on random and time varying topologies under unreliable and bandwidth-constrained communication network. This paper studies a stochastic approximation approach with a Fully Stochastic Primal Dual Algorithm (FSPDA) framework. Our framework relies on a novel observation that randomness in time varying topology can be incorporated in a stochastic augmented Lagrangian formulation, whose expected value admits saddle points that coincide with stationary solutions of the decentralized optimization problem. With the FSPDA framework, we develop two new algorithms supporting efficient sparsified communication on random time varying topologies -- FSPDA-SA allows agents to execute multiple local gradient steps depending on the time varying topology to accelerate convergence, and FSPDA-STORM further incorporates a variance reduction step to improve sample complexity. For problems with smooth (possibly non-convex) objective function, within $T$ iterations, we show that FSPDA-SA (resp. FSPDA-STORM) finds an $\mathcal{O}( 1/\sqrt{T} )$-stationary (resp. $\mathcal{O}( 1/T^{2/3} )$) solution. Numerical experiments show the benefits of the FSPDA algorithms.
