Convergence of Decentralized Stochastic Subgradient-based Methods for Nonsmooth Nonconvex functions
Siyuan Zhang, Nachuan Xiao, Xin Liu
TL;DR
The paper introduces a unified decentralized stochastic subgradient framework for nonsmooth nonconvex optimization lacking Clarke regularity. By relating discrete updates to a coercive Lyapunov function-driven differential inclusion, it proves consensus and convergence to the DI’s stable set under diminishing step-sizes for random reshuffling and with-replacement sampling. It shows that DSGD, DSGD-M, DSGD-T, and DSignSGD all fit into the framework and attain global or high-probability convergence to the conservative-field-based critical points, thereby providing the first convergence guarantees in this setting. Preliminary experiments on nonsmooth neural networks demonstrate efficiency and robustness, with DSignSGD offering competitive performance. The work paves the way for further analysis of rates, time-varying networks, asynchronous updates, and communication-compression in nonsmooth decentralized optimization.
Abstract
In this paper, we focus on the decentralized stochastic subgradient-based methods in minimizing nonsmooth nonconvex functions without Clarke regularity, especially in the decentralized training of nonsmooth neural networks. We propose a general framework that unifies various decentralized subgradient-based methods, such as decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGD-M). To establish the convergence properties of our proposed framework, we relate the discrete iterates to the trajectories of a continuous-time differential inclusion, which is assumed to have a coercive Lyapunov function with a stable set $\mathcal{A}$. We prove the asymptotic convergence of the iterates to the stable set $\mathcal{A}$ with sufficiently small and diminishing step-sizes. These results provide first convergence guarantees for some well-recognized of decentralized stochastic subgradient-based methods without Clarke regularity of the objective function. Preliminary numerical experiments demonstrate that our proposed framework yields highly efficient decentralized stochastic subgradient-based methods with convergence guarantees in the training of nonsmooth neural networks.
