An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization

Guy Kornowski; Ohad Shamir

An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization

Guy Kornowski, Ohad Shamir

TL;DR

This work resolves a long-standing question in zero-order nonsmooth nonconvex stochastic optimization by achieving a dimension-linear, optimal complexity $O(d\delta^{-1}\epsilon^{-3})$ for obtaining $(\delta,\epsilon)$-stationary points using only noisy function evaluations. The key ideas are randomized smoothing and a novel Goldstein $\delta$-subdifferential analysis that ties stationary points of the smoothed objective to those of the original function, enabling the use of a tight stochastic first-order method for Lipschitz objectives. The authors also develop parallel and high-probability variants that maintain optimal dependence on the accuracy parameters and demonstrate adaptivity to the smooth case (recovering the $O(d\epsilon^{-4})$ rate when $f$ is smooth). These results close the gap between nonsmooth and smooth zero-order optimization in the stochastic setting and provide practically efficient, provably optimal algorithms.

Abstract

We study the complexity of producing $(δ,ε)$-stationary points of Lipschitz objectives which are possibly neither smooth nor convex, using only noisy function evaluations. Recent works proposed several stochastic zero-order algorithms that solve this task, all of which suffer from a dimension-dependence of $Ω(d^{3/2})$ where $d$ is the dimension of the problem, which was conjectured to be optimal. We refute this conjecture by providing a faster algorithm that has complexity $O(dδ^{-1}ε^{-3})$, which is optimal (up to numerical constants) with respect to $d$ and also optimal with respect to the accuracy parameters $δ,ε$, thus solving an open question due to Lin et al. (NeurIPS'22). Moreover, the convergence rate achieved by our algorithm is also optimal for smooth objectives, proving that in the nonconvex stochastic zero-order setting, nonsmooth optimization is as easy as smooth optimization. We provide algorithms that achieve the aforementioned convergence rate in expectation as well as with high probability. Our analysis is based on a simple yet powerful lemma regarding the Goldstein-subdifferential set, which allows utilizing recent advancements in first-order nonsmooth nonconvex optimization.

An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization

TL;DR

This work resolves a long-standing question in zero-order nonsmooth nonconvex stochastic optimization by achieving a dimension-linear, optimal complexity

for obtaining

-stationary points using only noisy function evaluations. The key ideas are randomized smoothing and a novel Goldstein

-subdifferential analysis that ties stationary points of the smoothed objective to those of the original function, enabling the use of a tight stochastic first-order method for Lipschitz objectives. The authors also develop parallel and high-probability variants that maintain optimal dependence on the accuracy parameters and demonstrate adaptivity to the smooth case (recovering the

rate when

is smooth). These results close the gap between nonsmooth and smooth zero-order optimization in the stochastic setting and provide practically efficient, provably optimal algorithms.

Abstract

We study the complexity of producing

-stationary points of Lipschitz objectives which are possibly neither smooth nor convex, using only noisy function evaluations. Recent works proposed several stochastic zero-order algorithms that solve this task, all of which suffer from a dimension-dependence of

where

is the dimension of the problem, which was conjectured to be optimal. We refute this conjecture by providing a faster algorithm that has complexity

, which is optimal (up to numerical constants) with respect to

and also optimal with respect to the accuracy parameters

, thus solving an open question due to Lin et al. (NeurIPS'22). Moreover, the convergence rate achieved by our algorithm is also optimal for smooth objectives, proving that in the nonconvex stochastic zero-order setting, nonsmooth optimization is as easy as smooth optimization. We provide algorithms that achieve the aforementioned convergence rate in expectation as well as with high probability. Our analysis is based on a simple yet powerful lemma regarding the Goldstein-subdifferential set, which allows utilizing recent advancements in first-order nonsmooth nonconvex optimization.

Paper Structure (12 sections, 7 theorems, 39 equations, 3 algorithms)

This paper contains 12 sections, 7 theorems, 39 equations, 3 algorithms.

Introduction
Preliminaries.
Notation.
Nonsmooth analysis.
Randomized smoothing.
Setting.
Algorithms and Main Results
Parallel complexity.
High probability guarantee.
Proof of Theorem \ref{['thm: upper']}
Proof of Theorem \ref{['thm: high prob']}
Concentration Lemma

Key Result

Proposition 3

For any $\rho\geq 0:\nabla f_\rho(\mathbf{x})\in\partial_\rho f(\mathbf{x})$. Hence, for $\rho=\delta$, if $\mathbf{x}$ is an $\epsilon$-stationary point of $f_\delta$, then it is a $(\delta,\epsilon)$-stationary point of $f$.

Theorems & Definitions (11)

Definition 1
Proposition 3: lin2022gradient, Theorem 3.1
Lemma 4
Theorem 5
Theorem 6
proof : Proof of Lemma \ref{['lem: delta,eps of smooth']}
Lemma 7
proof
Theorem 8: cutkosky2023optimal
Lemma 9
...and 1 more

An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization

TL;DR

Abstract

An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (11)