Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization
Luke Marrinan, Uday V. Shanbhag, Farzad Yousefian
TL;DR
This work tackles constrained nonsmooth nonconvex stochastic optimization by introducing a spherical- Smoothed zeroth-order framework. It analyzes two schemes: VRG-ZO, a variance-reduced zeroth-order gradient method, and VRSQN-ZO, a zeroth-order stochastic quasi-Newton method with Moreau smoothing. Both methods rely on f_{η}(x) = E_u[f(x+ηu)] and Clarke-stationarity concepts to connect smoothed and original problems, delivering almost-sure convergence and explicit iteration and sample-complexity bounds. The approach yields practical algorithms with convergence guarantees for nonsmooth, nonconvex stochastic objectives and demonstrates competitiveness on logistics-style problems and quadratic tests, highlighting the potential of zeroth-order smoothing for challenging optimization settings.
Abstract
We consider the minimization of a Lipschitz continuous and expectation-valued function, denoted by $f$ and defined as $f(\mathbf{x}) \triangleq \mathbb{E}[\tilde{f}(\mathbf{x}, \mathbfξ)]$, over a closed and convex set $\mathcal{X}$. We obtain asymptotics as well as rate and complexity guarantees for computing approximate Clarke-stationary points via zeroth-order schemes. We adopt an approach reliant on minimizing $f_η$ where $f_η(\mathbf{x}) \triangleq \mathbb{E}_{\mathbf{u}}\left[\mathbf{x}, f(\mathbf{x}+η\mathbf{u})\, \right]$, $\mathbf{u}$ is a random variable defined on a unit sphere, and $η> 0$. In fact, it is known that a stationary point of the $η$-smoothed problem is an $η$-stationary point for the original problem in the Clarke sense. In such a setting, we develop two schemes with promising empirical behavior. (I) We develop a variance-reduced zeroth-order gradient framework (VRG-ZO) for minimizing $f_η$ over $\mathcal{X}$. In this setting, we make two sets of contributions for the sequence generated by the proposed zeroth-order gradient scheme. (a) The residual function of the smoothed problem tends to zero almost surely along the generated sequence, guaranteeing $η$-Clarke stationary solutions of the original problem; (b) To compute an $\mathbf{x}$ such that the expected norm of the residual of the $η$-smoothed problem is within $ε$ requires no greater than $\mathcal{O}({n^{1/2}}{(L_0η^{-1} +L_0^2)} ε^{-2})$ projection steps and $\mathcal{O}({n^{3/2}(L_0^3η^{-2}+L_0^5)} ε^{-4})$ function evaluations. (II) Our second scheme is a zeroth-order stochastic quasi-Newton scheme (VRSQN-ZO) reliant on randomized and Moreau smoothing; the iteration and sample complexities are $\mathcal{O}({L_0^{4}}{n^{2}}{η^{-4}}ε^{-2})$ and $\mathcal{O}(L_0^{9} n^{5}η^{-5}ε^{-5})$, respectively.
