Table of Contents
Fetching ...

Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization

Luke Marrinan, Uday V. Shanbhag, Farzad Yousefian

TL;DR

This work tackles constrained nonsmooth nonconvex stochastic optimization by introducing a spherical- Smoothed zeroth-order framework. It analyzes two schemes: VRG-ZO, a variance-reduced zeroth-order gradient method, and VRSQN-ZO, a zeroth-order stochastic quasi-Newton method with Moreau smoothing. Both methods rely on f_{η}(x) = E_u[f(x+ηu)] and Clarke-stationarity concepts to connect smoothed and original problems, delivering almost-sure convergence and explicit iteration and sample-complexity bounds. The approach yields practical algorithms with convergence guarantees for nonsmooth, nonconvex stochastic objectives and demonstrates competitiveness on logistics-style problems and quadratic tests, highlighting the potential of zeroth-order smoothing for challenging optimization settings.

Abstract

We consider the minimization of a Lipschitz continuous and expectation-valued function, denoted by $f$ and defined as $f(\mathbf{x}) \triangleq \mathbb{E}[\tilde{f}(\mathbf{x}, \mathbfξ)]$, over a closed and convex set $\mathcal{X}$. We obtain asymptotics as well as rate and complexity guarantees for computing approximate Clarke-stationary points via zeroth-order schemes. We adopt an approach reliant on minimizing $f_η$ where $f_η(\mathbf{x}) \triangleq \mathbb{E}_{\mathbf{u}}\left[\mathbf{x}, f(\mathbf{x}+η\mathbf{u})\, \right]$, $\mathbf{u}$ is a random variable defined on a unit sphere, and $η> 0$. In fact, it is known that a stationary point of the $η$-smoothed problem is an $η$-stationary point for the original problem in the Clarke sense. In such a setting, we develop two schemes with promising empirical behavior. (I) We develop a variance-reduced zeroth-order gradient framework (VRG-ZO) for minimizing $f_η$ over $\mathcal{X}$. In this setting, we make two sets of contributions for the sequence generated by the proposed zeroth-order gradient scheme. (a) The residual function of the smoothed problem tends to zero almost surely along the generated sequence, guaranteeing $η$-Clarke stationary solutions of the original problem; (b) To compute an $\mathbf{x}$ such that the expected norm of the residual of the $η$-smoothed problem is within $ε$ requires no greater than $\mathcal{O}({n^{1/2}}{(L_0η^{-1} +L_0^2)} ε^{-2})$ projection steps and $\mathcal{O}({n^{3/2}(L_0^3η^{-2}+L_0^5)} ε^{-4})$ function evaluations. (II) Our second scheme is a zeroth-order stochastic quasi-Newton scheme (VRSQN-ZO) reliant on randomized and Moreau smoothing; the iteration and sample complexities are $\mathcal{O}({L_0^{4}}{n^{2}}{η^{-4}}ε^{-2})$ and $\mathcal{O}(L_0^{9} n^{5}η^{-5}ε^{-5})$, respectively.

Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization

TL;DR

This work tackles constrained nonsmooth nonconvex stochastic optimization by introducing a spherical- Smoothed zeroth-order framework. It analyzes two schemes: VRG-ZO, a variance-reduced zeroth-order gradient method, and VRSQN-ZO, a zeroth-order stochastic quasi-Newton method with Moreau smoothing. Both methods rely on f_{η}(x) = E_u[f(x+ηu)] and Clarke-stationarity concepts to connect smoothed and original problems, delivering almost-sure convergence and explicit iteration and sample-complexity bounds. The approach yields practical algorithms with convergence guarantees for nonsmooth, nonconvex stochastic objectives and demonstrates competitiveness on logistics-style problems and quadratic tests, highlighting the potential of zeroth-order smoothing for challenging optimization settings.

Abstract

We consider the minimization of a Lipschitz continuous and expectation-valued function, denoted by and defined as , over a closed and convex set . We obtain asymptotics as well as rate and complexity guarantees for computing approximate Clarke-stationary points via zeroth-order schemes. We adopt an approach reliant on minimizing where , is a random variable defined on a unit sphere, and . In fact, it is known that a stationary point of the -smoothed problem is an -stationary point for the original problem in the Clarke sense. In such a setting, we develop two schemes with promising empirical behavior. (I) We develop a variance-reduced zeroth-order gradient framework (VRG-ZO) for minimizing over . In this setting, we make two sets of contributions for the sequence generated by the proposed zeroth-order gradient scheme. (a) The residual function of the smoothed problem tends to zero almost surely along the generated sequence, guaranteeing -Clarke stationary solutions of the original problem; (b) To compute an such that the expected norm of the residual of the -smoothed problem is within requires no greater than projection steps and function evaluations. (II) Our second scheme is a zeroth-order stochastic quasi-Newton scheme (VRSQN-ZO) reliant on randomized and Moreau smoothing; the iteration and sample complexities are and , respectively.
Paper Structure (14 sections, 18 theorems, 93 equations, 1 figure, 9 tables, 3 algorithms)

This paper contains 14 sections, 18 theorems, 93 equations, 1 figure, 9 tables, 3 algorithms.

Key Result

Proposition 2.2

Suppose $h$ is $L_0$-Lipschitz continuous on $\mathbb{R}^n$. Then the following hold .

Figures (1)

  • Figure 1: VRSQN-ZO vs. VRG-ZO on the minimum of two noise-afflicted quadratics.

Theorems & Definitions (37)

  • Definition 2.1: Directional derivatives and Clarke generalized gradient clarke98
  • Proposition 2.2: Properties of Clarke generalized gradients clarke98
  • Lemma 2.3: Lévy concentration on $\mathbb{S}^n$ Wainwright2019
  • Lemma 2.4: Properties of spherical smoothing
  • Proof 1
  • Proposition 2.5: Stationarity of $f_{\eta} \Rightarrow \eta$-Clarke stationarity of $f$
  • Definition 2.6: The residual mapping
  • Lemma 2.7: CSY2021MPEC
  • Lemma 2.8
  • Lemma 2.9: Robbins-Siegmund Lemma
  • ...and 27 more