Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

Lesi Chen; Jing Xu; Luo Luo

Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

Lesi Chen, Jing Xu, Luo Luo

TL;DR

The paper tackles nonsmooth, nonconvex stochastic optimization using zeroth-order information and Goldstein stationarity. It introduces GFM$^+$, a gradient-free method that employs randomized smoothing and a variance-reduced recursive gradient estimator to achieve improved complexity: $\mathcal{O}\big(L^{3} d^{3/2}/\epsilon^{3} + \Delta L^{2} d^{3/2}/(\delta \epsilon^{3})\big)$ calls to the zeroth-order oracle for finding a $(\delta,\epsilon)$-Goldstein stationary point. The approach also extends to convex settings with warm-start variants (WS-GFM$^+$, WS-GFM), delivering favorable complexity bounds that leverage initialization to reduce dependence on $\Delta$ and the distance to the optimum. Empirical results on nonconvex penalized SVM and black-box CNN attacks corroborate the theoretical gains, showing faster convergence and higher attack success rates at comparable zeroth-order budgets. Overall, the work advances gradient-free nonsmooth stochastic optimization by matching or improving first-order rates up to a dimensional factor and clarifying the role of smoothing and variance reduction in zeroth-order settings.

Abstract

We consider the optimization problem of the form $\min_{x \in \mathbb{R}^d} f(x) \triangleq \mathbb{E}_ξ [F(x; ξ)]$, where the component $F(x;ξ)$ is $L$-mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most $\mathcal{O}( L^4 d^{3/2} ε^{-4} + ΔL^3 d^{3/2} δ^{-1} ε^{-4})$ stochastic zeroth-order oracle complexity to find a $(δ,ε)$-Goldstein stationary point of objective function, where $Δ= f(x_0) - \inf_{x \in \mathbb{R}^d} f(x)$ and $x_0$ is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} ε^{-3}+ ΔL^2 d^{3/2} δ^{-1} ε^{-3})$.

Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

TL;DR

The paper tackles nonsmooth, nonconvex stochastic optimization using zeroth-order information and Goldstein stationarity. It introduces GFM

, a gradient-free method that employs randomized smoothing and a variance-reduced recursive gradient estimator to achieve improved complexity:

calls to the zeroth-order oracle for finding a

-Goldstein stationary point. The approach also extends to convex settings with warm-start variants (WS-GFM

, WS-GFM), delivering favorable complexity bounds that leverage initialization to reduce dependence on

and the distance to the optimum. Empirical results on nonconvex penalized SVM and black-box CNN attacks corroborate the theoretical gains, showing faster convergence and higher attack success rates at comparable zeroth-order budgets. Overall, the work advances gradient-free nonsmooth stochastic optimization by matching or improving first-order rates up to a dimensional factor and clarifying the role of smoothing and variance reduction in zeroth-order settings.

Abstract

We consider the optimization problem of the form

, where the component

-mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most

stochastic zeroth-order oracle complexity to find a

-Goldstein stationary point of objective function, where

and

is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to

Paper Structure (22 sections, 16 theorems, 49 equations, 2 figures, 4 tables, 4 algorithms)

This paper contains 22 sections, 16 theorems, 49 equations, 2 figures, 4 tables, 4 algorithms.

Introduction
Preliminaries
Notation and Assumptions
Goldstein Stationary Point
Randomized Smoothing
Algorithms and Main Results
The Algorithms
Complexity Analysis
The Results for Convex Optimization
Numerical Experiments
Nonconvex Penalized SVM
Black-Box Attack on CNN
Conclusion
The Proof of Proposition \ref{['prop:avgL']}
The Tightness of $L_\delta$ and $M_\delta$
...and 7 more sections

Key Result

Proposition 1

Under Assumption asm:Lip, for any $x, y \in {\mathbb{R}}^d$ it holds that

Figures (2)

Figure 1: For nonconvex penalized SVM, we present the results for complexity vs.loss on datasets "a9a", "w8a", "covtype", "ijcnn1", "mushrooms" and "phishing". The result for each method is averaged over 20 independent runs.
Figure 2: For black-box attack, we present the results for complexity vs. success rate on datasets "MNIST" and "Fashion-MNIST".

Theorems & Definitions (32)

Proposition 1: mean-squared continuity
Definition 1: Clarke subdifferential
Definition 2: Approximate Clarke stationary point
Definition 3: Goldstein subdifferential
Definition 4: Approximate Goldstein stationary point
Definition 5: uniform smoothing
Proposition 2
Definition 6: zeroth-order gradient estimator
Definition 7: Mini-batch zeroth-order gradient estimator
Proposition 3: Lemma D.1 of lin2022gradient
...and 22 more

Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

TL;DR

Abstract

Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (32)