Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization

Zhenwei Lin; Jingfan Xia; Qi Deng; Luo Luo

Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization

Zhenwei Lin, Jingfan Xia, Qi Deng, Luo Luo

TL;DR

The paper addresses decentralized optimization of non-smooth, non-convex Lipschitz objectives using gradient-free methods. It introduces DGFM and DGFM$^+$, which combine randomized smoothing, gradient tracking, and variance reduction to achieve provable convergence guarantees with favorable zeroth-order complexity. Theoretical results show the base method reaches a $(\delta,\varepsilon)$-Goldstein stationary point in $O(d^{3/2}\delta^{-1}\varepsilon^{-4})$ zeroth-order calls, improved to $O(d^{3/2}\delta^{-1}\varepsilon^{-3})$ via SPIDER in DGFM$^+$, while maintaining comparable communication to iterations. Empirical studies on nonconvex SVM and universal adversarial attacks corroborate the practical benefits of the decentralized zeroth-order framework, especially the variance-reduced variant.

Abstract

We consider decentralized gradient-free optimization of minimizing Lipschitz continuous functions that satisfy neither smoothness nor convexity assumption. We propose two novel gradient-free algorithms, the Decentralized Gradient-Free Method (DGFM) and its variant, the Decentralized Gradient-Free Method$^+$ (DGFM$^{+}$). Based on the techniques of randomized smoothing and gradient tracking, DGFM requires the computation of the zeroth-order oracle of a single sample in each iteration, making it less demanding in terms of computational resources for individual computing nodes. Theoretically, DGFM achieves a complexity of $\mathcal O(d^{3/2}δ^{-1}\varepsilon ^{-4})$ for obtaining an $(δ,\varepsilon)$-Goldstein stationary point. DGFM$^{+}$, an advanced version of DGFM, incorporates variance reduction to further improve the convergence behavior. It samples a mini-batch at each iteration and periodically draws a larger batch of data, which improves the complexity to $\mathcal O(d^{3/2}δ^{-1} \varepsilon^{-3})$. Moreover, experimental results underscore the empirical advantages of our proposed algorithms when applied to real-world datasets.

Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization

TL;DR

The paper addresses decentralized optimization of non-smooth, non-convex Lipschitz objectives using gradient-free methods. It introduces DGFM and DGFM

, which combine randomized smoothing, gradient tracking, and variance reduction to achieve provable convergence guarantees with favorable zeroth-order complexity. Theoretical results show the base method reaches a

-Goldstein stationary point in

zeroth-order calls, improved to

via SPIDER in DGFM

, while maintaining comparable communication to iterations. Empirical studies on nonconvex SVM and universal adversarial attacks corroborate the practical benefits of the decentralized zeroth-order framework, especially the variance-reduced variant.

Abstract

(DGFM

). Based on the techniques of randomized smoothing and gradient tracking, DGFM requires the computation of the zeroth-order oracle of a single sample in each iteration, making it less demanding in terms of computational resources for individual computing nodes. Theoretically, DGFM achieves a complexity of

for obtaining an

-Goldstein stationary point. DGFM

, an advanced version of DGFM, incorporates variance reduction to further improve the convergence behavior. It samples a mini-batch at each iteration and periodically draws a larger batch of data, which improves the complexity to

. Moreover, experimental results underscore the empirical advantages of our proposed algorithms when applied to real-world datasets.

Paper Structure (36 sections, 30 theorems, 112 equations, 2 figures, 3 tables, 4 algorithms)

This paper contains 36 sections, 30 theorems, 112 equations, 2 figures, 3 tables, 4 algorithms.

Introduction
Contributions
Preliminaries
Notations.
Stationary condition.
Randomized Smoothing.
DGFM
DGFM$^{+}$
Numerical Study
Nonconvex SVM with Capped-$\ell_1$ Penalty
Data:
Model:
Network topology:
Performance measures:
Comparison:
...and 21 more sections

Key Result

Proposition 2.1

For the zeroth-order oracle estimator in Definition def:Given-a-stochastic, we have $\mathbb{E}_{w,\xi}[g(x;w,\xi)]=\nabla f_{\delta}(x)$ and $\mathbb{E}_{w,\xi}[\Vert g(x;w,\xi)\Vert^{2}]\leq16\sqrt{2\pi}dL_{f}^{2}$.

Figures (2)

Figure 1: We assess the convergence performance of four algorithms by plotting the objective function value on the $y$-axis against the number of zeroth-order calls on the $x$-axis.
Figure 2: We assess the attacking performance of four algorithms by plotting the accuracy after attacking on the $y$-axis against the number of zeroth-order calls on the $x$-axis.

Theorems & Definitions (52)

Definition 2.1
Definition 2.2: ($\delta,\varepsilon$)-Goldstein Stationary Point
Definition 2.3: Randomized smoothing
Definition 2.4: Zeroth-order oracle estimators
Proposition 2.1: Lemma D.1 lin2022gradient
Proposition 2.2: Proposition 2.2 chen2023faster
Remark 1
Proposition 2.3
Lemma 3.1
Lemma 3.2: Consensus error decay
...and 42 more

Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization

TL;DR

Abstract

Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (52)