Table of Contents
Fetching ...

Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization

Zhenwei Lin, Jingfan Xia, Qi Deng, Luo Luo

TL;DR

The paper addresses decentralized optimization of non-smooth, non-convex Lipschitz objectives using gradient-free methods. It introduces DGFM and DGFM$^+$, which combine randomized smoothing, gradient tracking, and variance reduction to achieve provable convergence guarantees with favorable zeroth-order complexity. Theoretical results show the base method reaches a $(\delta,\varepsilon)$-Goldstein stationary point in $O(d^{3/2}\delta^{-1}\varepsilon^{-4})$ zeroth-order calls, improved to $O(d^{3/2}\delta^{-1}\varepsilon^{-3})$ via SPIDER in DGFM$^+$, while maintaining comparable communication to iterations. Empirical studies on nonconvex SVM and universal adversarial attacks corroborate the practical benefits of the decentralized zeroth-order framework, especially the variance-reduced variant.

Abstract

We consider decentralized gradient-free optimization of minimizing Lipschitz continuous functions that satisfy neither smoothness nor convexity assumption. We propose two novel gradient-free algorithms, the Decentralized Gradient-Free Method (DGFM) and its variant, the Decentralized Gradient-Free Method$^+$ (DGFM$^{+}$). Based on the techniques of randomized smoothing and gradient tracking, DGFM requires the computation of the zeroth-order oracle of a single sample in each iteration, making it less demanding in terms of computational resources for individual computing nodes. Theoretically, DGFM achieves a complexity of $\mathcal O(d^{3/2}δ^{-1}\varepsilon ^{-4})$ for obtaining an $(δ,\varepsilon)$-Goldstein stationary point. DGFM$^{+}$, an advanced version of DGFM, incorporates variance reduction to further improve the convergence behavior. It samples a mini-batch at each iteration and periodically draws a larger batch of data, which improves the complexity to $\mathcal O(d^{3/2}δ^{-1} \varepsilon^{-3})$. Moreover, experimental results underscore the empirical advantages of our proposed algorithms when applied to real-world datasets.

Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization

TL;DR

The paper addresses decentralized optimization of non-smooth, non-convex Lipschitz objectives using gradient-free methods. It introduces DGFM and DGFM, which combine randomized smoothing, gradient tracking, and variance reduction to achieve provable convergence guarantees with favorable zeroth-order complexity. Theoretical results show the base method reaches a -Goldstein stationary point in zeroth-order calls, improved to via SPIDER in DGFM, while maintaining comparable communication to iterations. Empirical studies on nonconvex SVM and universal adversarial attacks corroborate the practical benefits of the decentralized zeroth-order framework, especially the variance-reduced variant.

Abstract

We consider decentralized gradient-free optimization of minimizing Lipschitz continuous functions that satisfy neither smoothness nor convexity assumption. We propose two novel gradient-free algorithms, the Decentralized Gradient-Free Method (DGFM) and its variant, the Decentralized Gradient-Free Method (DGFM). Based on the techniques of randomized smoothing and gradient tracking, DGFM requires the computation of the zeroth-order oracle of a single sample in each iteration, making it less demanding in terms of computational resources for individual computing nodes. Theoretically, DGFM achieves a complexity of for obtaining an -Goldstein stationary point. DGFM, an advanced version of DGFM, incorporates variance reduction to further improve the convergence behavior. It samples a mini-batch at each iteration and periodically draws a larger batch of data, which improves the complexity to . Moreover, experimental results underscore the empirical advantages of our proposed algorithms when applied to real-world datasets.
Paper Structure (36 sections, 30 theorems, 112 equations, 2 figures, 3 tables, 4 algorithms)

This paper contains 36 sections, 30 theorems, 112 equations, 2 figures, 3 tables, 4 algorithms.

Key Result

Proposition 2.1

For the zeroth-order oracle estimator in Definition def:Given-a-stochastic, we have $\mathbb{E}_{w,\xi}[g(x;w,\xi)]=\nabla f_{\delta}(x)$ and $\mathbb{E}_{w,\xi}[\Vert g(x;w,\xi)\Vert^{2}]\leq16\sqrt{2\pi}dL_{f}^{2}$.

Figures (2)

  • Figure 1: We assess the convergence performance of four algorithms by plotting the objective function value on the $y$-axis against the number of zeroth-order calls on the $x$-axis.
  • Figure 2: We assess the attacking performance of four algorithms by plotting the accuracy after attacking on the $y$-axis against the number of zeroth-order calls on the $x$-axis.

Theorems & Definitions (52)

  • Definition 2.1
  • Definition 2.2: ($\delta,\varepsilon$)-Goldstein Stationary Point
  • Definition 2.3: Randomized smoothing
  • Definition 2.4: Zeroth-order oracle estimators
  • Proposition 2.1: Lemma D.1 lin2022gradient
  • Proposition 2.2: Proposition 2.2 chen2023faster
  • Remark 1
  • Proposition 2.3
  • Lemma 3.1
  • Lemma 3.2: Consensus error decay
  • ...and 42 more