Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization
Lesi Chen, Jing Xu, Luo Luo
TL;DR
The paper tackles nonsmooth, nonconvex stochastic optimization using zeroth-order information and Goldstein stationarity. It introduces GFM$^+$, a gradient-free method that employs randomized smoothing and a variance-reduced recursive gradient estimator to achieve improved complexity: $\mathcal{O}\big(L^{3} d^{3/2}/\epsilon^{3} + \Delta L^{2} d^{3/2}/(\delta \epsilon^{3})\big)$ calls to the zeroth-order oracle for finding a $(\delta,\epsilon)$-Goldstein stationary point. The approach also extends to convex settings with warm-start variants (WS-GFM$^+$, WS-GFM), delivering favorable complexity bounds that leverage initialization to reduce dependence on $\Delta$ and the distance to the optimum. Empirical results on nonconvex penalized SVM and black-box CNN attacks corroborate the theoretical gains, showing faster convergence and higher attack success rates at comparable zeroth-order budgets. Overall, the work advances gradient-free nonsmooth stochastic optimization by matching or improving first-order rates up to a dimensional factor and clarifying the role of smoothing and variance reduction in zeroth-order settings.
Abstract
We consider the optimization problem of the form $\min_{x \in \mathbb{R}^d} f(x) \triangleq \mathbb{E}_ξ [F(x; ξ)]$, where the component $F(x;ξ)$ is $L$-mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most $\mathcal{O}( L^4 d^{3/2} ε^{-4} + ΔL^3 d^{3/2} δ^{-1} ε^{-4})$ stochastic zeroth-order oracle complexity to find a $(δ,ε)$-Goldstein stationary point of objective function, where $Δ= f(x_0) - \inf_{x \in \mathbb{R}^d} f(x)$ and $x_0$ is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} ε^{-3}+ ΔL^2 d^{3/2} δ^{-1} ε^{-3})$.
