Table of Contents
Fetching ...

Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

Lesi Chen, Jing Xu, Luo Luo

TL;DR

The paper tackles nonsmooth, nonconvex stochastic optimization using zeroth-order information and Goldstein stationarity. It introduces GFM$^+$, a gradient-free method that employs randomized smoothing and a variance-reduced recursive gradient estimator to achieve improved complexity: $\mathcal{O}\big(L^{3} d^{3/2}/\epsilon^{3} + \Delta L^{2} d^{3/2}/(\delta \epsilon^{3})\big)$ calls to the zeroth-order oracle for finding a $(\delta,\epsilon)$-Goldstein stationary point. The approach also extends to convex settings with warm-start variants (WS-GFM$^+$, WS-GFM), delivering favorable complexity bounds that leverage initialization to reduce dependence on $\Delta$ and the distance to the optimum. Empirical results on nonconvex penalized SVM and black-box CNN attacks corroborate the theoretical gains, showing faster convergence and higher attack success rates at comparable zeroth-order budgets. Overall, the work advances gradient-free nonsmooth stochastic optimization by matching or improving first-order rates up to a dimensional factor and clarifying the role of smoothing and variance reduction in zeroth-order settings.

Abstract

We consider the optimization problem of the form $\min_{x \in \mathbb{R}^d} f(x) \triangleq \mathbb{E}_ξ [F(x; ξ)]$, where the component $F(x;ξ)$ is $L$-mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most $\mathcal{O}( L^4 d^{3/2} ε^{-4} + ΔL^3 d^{3/2} δ^{-1} ε^{-4})$ stochastic zeroth-order oracle complexity to find a $(δ,ε)$-Goldstein stationary point of objective function, where $Δ= f(x_0) - \inf_{x \in \mathbb{R}^d} f(x)$ and $x_0$ is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} ε^{-3}+ ΔL^2 d^{3/2} δ^{-1} ε^{-3})$.

Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

TL;DR

The paper tackles nonsmooth, nonconvex stochastic optimization using zeroth-order information and Goldstein stationarity. It introduces GFM, a gradient-free method that employs randomized smoothing and a variance-reduced recursive gradient estimator to achieve improved complexity: calls to the zeroth-order oracle for finding a -Goldstein stationary point. The approach also extends to convex settings with warm-start variants (WS-GFM, WS-GFM), delivering favorable complexity bounds that leverage initialization to reduce dependence on and the distance to the optimum. Empirical results on nonconvex penalized SVM and black-box CNN attacks corroborate the theoretical gains, showing faster convergence and higher attack success rates at comparable zeroth-order budgets. Overall, the work advances gradient-free nonsmooth stochastic optimization by matching or improving first-order rates up to a dimensional factor and clarifying the role of smoothing and variance reduction in zeroth-order settings.

Abstract

We consider the optimization problem of the form , where the component is -mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most stochastic zeroth-order oracle complexity to find a -Goldstein stationary point of objective function, where and is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to .
Paper Structure (22 sections, 16 theorems, 49 equations, 2 figures, 4 tables, 4 algorithms)

This paper contains 22 sections, 16 theorems, 49 equations, 2 figures, 4 tables, 4 algorithms.

Key Result

Proposition 1

Under Assumption asm:Lip, for any $x, y \in {\mathbb{R}}^d$ it holds that

Figures (2)

  • Figure 1: For nonconvex penalized SVM, we present the results for complexity vs.loss on datasets "a9a", "w8a", "covtype", "ijcnn1", "mushrooms" and "phishing". The result for each method is averaged over 20 independent runs.
  • Figure 2: For black-box attack, we present the results for complexity vs. success rate on datasets "MNIST" and "Fashion-MNIST".

Theorems & Definitions (32)

  • Proposition 1: mean-squared continuity
  • Definition 1: Clarke subdifferential
  • Definition 2: Approximate Clarke stationary point
  • Definition 3: Goldstein subdifferential
  • Definition 4: Approximate Goldstein stationary point
  • Definition 5: uniform smoothing
  • Proposition 2
  • Definition 6: zeroth-order gradient estimator
  • Definition 7: Mini-batch zeroth-order gradient estimator
  • Proposition 3: Lemma D.1 of lin2022gradient
  • ...and 22 more