Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization

Luke Marrinan; Uday V. Shanbhag; Farzad Yousefian

Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization

Luke Marrinan, Uday V. Shanbhag, Farzad Yousefian

TL;DR

This work tackles constrained nonsmooth nonconvex stochastic optimization by introducing a spherical- Smoothed zeroth-order framework. It analyzes two schemes: VRG-ZO, a variance-reduced zeroth-order gradient method, and VRSQN-ZO, a zeroth-order stochastic quasi-Newton method with Moreau smoothing. Both methods rely on f_{η}(x) = E_u[f(x+ηu)] and Clarke-stationarity concepts to connect smoothed and original problems, delivering almost-sure convergence and explicit iteration and sample-complexity bounds. The approach yields practical algorithms with convergence guarantees for nonsmooth, nonconvex stochastic objectives and demonstrates competitiveness on logistics-style problems and quadratic tests, highlighting the potential of zeroth-order smoothing for challenging optimization settings.

Abstract

We consider the minimization of a Lipschitz continuous and expectation-valued function, denoted by $f$ and defined as $f(\mathbf{x}) \triangleq \mathbb{E}[\tilde{f}(\mathbf{x}, \mathbfξ)]$, over a closed and convex set $\mathcal{X}$. We obtain asymptotics as well as rate and complexity guarantees for computing approximate Clarke-stationary points via zeroth-order schemes. We adopt an approach reliant on minimizing $f_η$ where $f_η(\mathbf{x}) \triangleq \mathbb{E}_{\mathbf{u}}\left[\mathbf{x}, f(\mathbf{x}+η\mathbf{u})\, \right]$, $\mathbf{u}$ is a random variable defined on a unit sphere, and $η> 0$. In fact, it is known that a stationary point of the $η$-smoothed problem is an $η$-stationary point for the original problem in the Clarke sense. In such a setting, we develop two schemes with promising empirical behavior. (I) We develop a variance-reduced zeroth-order gradient framework (VRG-ZO) for minimizing $f_η$ over $\mathcal{X}$. In this setting, we make two sets of contributions for the sequence generated by the proposed zeroth-order gradient scheme. (a) The residual function of the smoothed problem tends to zero almost surely along the generated sequence, guaranteeing $η$-Clarke stationary solutions of the original problem; (b) To compute an $\mathbf{x}$ such that the expected norm of the residual of the $η$-smoothed problem is within $ε$ requires no greater than $\mathcal{O}({n^{1/2}}{(L_0η^{-1} +L_0^2)} ε^{-2})$ projection steps and $\mathcal{O}({n^{3/2}(L_0^3η^{-2}+L_0^5)} ε^{-4})$ function evaluations. (II) Our second scheme is a zeroth-order stochastic quasi-Newton scheme (VRSQN-ZO) reliant on randomized and Moreau smoothing; the iteration and sample complexities are $\mathcal{O}({L_0^{4}}{n^{2}}{η^{-4}}ε^{-2})$ and $\mathcal{O}(L_0^{9} n^{5}η^{-5}ε^{-5})$, respectively.

Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization

TL;DR

Abstract

We consider the minimization of a Lipschitz continuous and expectation-valued function, denoted by

and defined as

, over a closed and convex set

. We obtain asymptotics as well as rate and complexity guarantees for computing approximate Clarke-stationary points via zeroth-order schemes. We adopt an approach reliant on minimizing

where

is a random variable defined on a unit sphere, and

. In fact, it is known that a stationary point of the

-smoothed problem is an

-stationary point for the original problem in the Clarke sense. In such a setting, we develop two schemes with promising empirical behavior. (I) We develop a variance-reduced zeroth-order gradient framework (VRG-ZO) for minimizing

over

. In this setting, we make two sets of contributions for the sequence generated by the proposed zeroth-order gradient scheme. (a) The residual function of the smoothed problem tends to zero almost surely along the generated sequence, guaranteeing

-Clarke stationary solutions of the original problem; (b) To compute an

such that the expected norm of the residual of the

-smoothed problem is within

requires no greater than

projection steps and

function evaluations. (II) Our second scheme is a zeroth-order stochastic quasi-Newton scheme (VRSQN-ZO) reliant on randomized and Moreau smoothing; the iteration and sample complexities are

and

, respectively.

Paper Structure (14 sections, 18 theorems, 93 equations, 1 figure, 9 tables, 3 algorithms)

This paper contains 14 sections, 18 theorems, 93 equations, 1 figure, 9 tables, 3 algorithms.

Introduction
Stationarity and smoothing
A randomized zeroth-order gradient method
Preliminaries
Convergence and rate analysis
A Smoothed Quasi-Newton Framework
A smoothed unconstrained formulation
Algorithm Description
Construction of the Inverse Hessian Approximation
Convergence Analysis
Numerical Results
Logistic Regression
Minimum of Two Noise-afflicted Quadratics
Concluding Remarks

Key Result

Proposition 2.2

Suppose $h$ is $L_0$-Lipschitz continuous on $\mathbb{R}^n$. Then the following hold .

Figures (1)

Figure 1: VRSQN-ZO vs. VRG-ZO on the minimum of two noise-afflicted quadratics.

Theorems & Definitions (37)

Definition 2.1: Directional derivatives and Clarke generalized gradient clarke98
Proposition 2.2: Properties of Clarke generalized gradients clarke98
Lemma 2.3: Lévy concentration on $\mathbb{S}^n$ Wainwright2019
Lemma 2.4: Properties of spherical smoothing
Proof 1
Proposition 2.5: Stationarity of $f_{\eta} \Rightarrow \eta$-Clarke stationarity of $f$
Definition 2.6: The residual mapping
Lemma 2.7: CSY2021MPEC
Lemma 2.8
Lemma 2.9: Robbins-Siegmund Lemma
...and 27 more

Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization

TL;DR

Abstract

Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (37)