Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization
Emre Sahinoglu, Shahin Shahrampour
TL;DR
This work studies finite-time decentralized optimization for nonsmooth nonconvex stochastic objectives by introducing Multi Epoch Decentralized Online Learning (ME-DOL). It leverages online-to-nonconvex conversion and randomized smoothing to derive ($\delta,\epsilon$)-stationarity guarantees across smooth, nonsmooth first-order, and nonsmooth zero-order settings, achieving a unified rate of $O(\delta^{-1}\epsilon^{-3})$ that matches centralized results. A key contribution is the explicit handling of network effects via the connectivity parameter $1-\rho$, yielding improved iteration complexity on well-connected networks. Empirical results on nonconvex penalized SVM problems demonstrate superior performance and validate the theoretical rates, including favorable behavior under different network topologies and zero- vs first-order oracles.
Abstract
We investigate the finite-time analysis of finding ($δ,ε$)-stationary points for nonsmooth nonconvex objectives in decentralized stochastic optimization. A set of agents aim at minimizing a global function using only their local information by interacting over a network. We present a novel algorithm, called Multi Epoch Decentralized Online Learning (ME-DOL), for which we establish the sample complexity in various settings. First, using a recently proposed online-to-nonconvex technique, we show that our algorithm recovers the optimal convergence rate of smooth nonconvex objectives. We then extend our analysis to the nonsmooth setting, building on properties of randomized smoothing and Goldstein-subdifferential sets. We establish the sample complexity of $O(δ^{-1}ε^{-3})$, which to the best of our knowledge is the first finite-time guarantee for decentralized nonsmooth nonconvex stochastic optimization in the first-order setting (without weak-convexity), matching its optimal centralized counterpart. We further prove the same rate for the zero-order oracle setting without using variance reduction.
