Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization
Zhenwei Lin, Jingfan Xia, Qi Deng, Luo Luo
TL;DR
The paper addresses decentralized optimization of non-smooth, non-convex Lipschitz objectives using gradient-free methods. It introduces DGFM and DGFM$^+$, which combine randomized smoothing, gradient tracking, and variance reduction to achieve provable convergence guarantees with favorable zeroth-order complexity. Theoretical results show the base method reaches a $(\delta,\varepsilon)$-Goldstein stationary point in $O(d^{3/2}\delta^{-1}\varepsilon^{-4})$ zeroth-order calls, improved to $O(d^{3/2}\delta^{-1}\varepsilon^{-3})$ via SPIDER in DGFM$^+$, while maintaining comparable communication to iterations. Empirical studies on nonconvex SVM and universal adversarial attacks corroborate the practical benefits of the decentralized zeroth-order framework, especially the variance-reduced variant.
Abstract
We consider decentralized gradient-free optimization of minimizing Lipschitz continuous functions that satisfy neither smoothness nor convexity assumption. We propose two novel gradient-free algorithms, the Decentralized Gradient-Free Method (DGFM) and its variant, the Decentralized Gradient-Free Method$^+$ (DGFM$^{+}$). Based on the techniques of randomized smoothing and gradient tracking, DGFM requires the computation of the zeroth-order oracle of a single sample in each iteration, making it less demanding in terms of computational resources for individual computing nodes. Theoretically, DGFM achieves a complexity of $\mathcal O(d^{3/2}δ^{-1}\varepsilon ^{-4})$ for obtaining an $(δ,\varepsilon)$-Goldstein stationary point. DGFM$^{+}$, an advanced version of DGFM, incorporates variance reduction to further improve the convergence behavior. It samples a mini-batch at each iteration and periodically draws a larger batch of data, which improves the complexity to $\mathcal O(d^{3/2}δ^{-1} \varepsilon^{-3})$. Moreover, experimental results underscore the empirical advantages of our proposed algorithms when applied to real-world datasets.
