Table of Contents
Fetching ...

Zeroth-Order Methods for Stochastic Nonconvex Nonsmooth Composite Optimization

Ziyi Chen, Peiran Yu, Heng Huang

TL;DR

This paper studies stochastic nonconvex nonsmooth composite optimization of the form $\phi(x)=F(x)+h(x)$ with $F(x)=\mathbb{E}_{\xi}[f_{\xi}(x)]$ and $h$ convex, addressing the lack of smoothness in key machine learning problems. It introduces two finite-time, zeroth-order stationary notions—proximal Goldstein stationary point (PGSP) and conditional gradient Goldstein stationary point (CGGSP)—based on the Goldstein $\delta$-subdifferential, and develops two zeroth-order methods: 0-PGD (proximal gradient) and 0-GCG (generalized conditional gradient). The paper provides convergence rates and function-evaluation complexities for both algorithms under minibatch and variance-reduced gradient estimations, showing that they can reach the proposed stationary points in finite time. Theoretical results are complemented by experiments on synthetic regularized ReLU networks and ResNet-20 on CIFAR-10, illustrating practical effectiveness and highlighting trade-offs between proximal steps and LMOs in different settings. Overall, the work broadens finite-time guarantees for nonconvex nonsmooth stochastic optimization by leveraging Goldstein-based stationarity and zeroth-order estimation.

Abstract

This work aims to solve a stochastic nonconvex nonsmooth composite optimization problem. Previous works on composite optimization problem requires the major part to satisfy Lipschitz smoothness or some relaxed smoothness conditions, which excludes some machine learning examples such as regularized ReLU network and sparse support matrix machine. In this work, we focus on stochastic nonconvex composite optimization problem without any smoothness assumptions. In particular, we propose two new notions of approximate stationary points for such optimization problem and obtain finite-time convergence results of two zeroth-order algorithms to these two approximate stationary points respectively. Finally, we demonstrate that these algorithms are effective using numerical experiments.

Zeroth-Order Methods for Stochastic Nonconvex Nonsmooth Composite Optimization

TL;DR

This paper studies stochastic nonconvex nonsmooth composite optimization of the form with and convex, addressing the lack of smoothness in key machine learning problems. It introduces two finite-time, zeroth-order stationary notions—proximal Goldstein stationary point (PGSP) and conditional gradient Goldstein stationary point (CGGSP)—based on the Goldstein -subdifferential, and develops two zeroth-order methods: 0-PGD (proximal gradient) and 0-GCG (generalized conditional gradient). The paper provides convergence rates and function-evaluation complexities for both algorithms under minibatch and variance-reduced gradient estimations, showing that they can reach the proposed stationary points in finite time. Theoretical results are complemented by experiments on synthetic regularized ReLU networks and ResNet-20 on CIFAR-10, illustrating practical effectiveness and highlighting trade-offs between proximal steps and LMOs in different settings. Overall, the work broadens finite-time guarantees for nonconvex nonsmooth stochastic optimization by leveraging Goldstein-based stationarity and zeroth-order estimation.

Abstract

This work aims to solve a stochastic nonconvex nonsmooth composite optimization problem. Previous works on composite optimization problem requires the major part to satisfy Lipschitz smoothness or some relaxed smoothness conditions, which excludes some machine learning examples such as regularized ReLU network and sparse support matrix machine. In this work, we focus on stochastic nonconvex composite optimization problem without any smoothness assumptions. In particular, we propose two new notions of approximate stationary points for such optimization problem and obtain finite-time convergence results of two zeroth-order algorithms to these two approximate stationary points respectively. Finally, we demonstrate that these algorithms are effective using numerical experiments.

Paper Structure

This paper contains 26 sections, 17 theorems, 58 equations, 3 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

Under Assumptions assum:f-assum:h2, the original objective function (eq:obj) has a non-empty solution set ${\arg\min}_{x\in\mathbb{R}^d}\phi(x)$, which is a subset of $\mathcal{B}_d(x^{(h)},R)\overset{\rm def}{=}\{x\in\mathbb{R}^d:\|x-x^{(h)}\|\le R\}$.

Figures (3)

  • Figure 1: Experimental results on regularized ReLu network.
  • Figure 2: Experimental results on regularized Resnet-20.
  • Figure : Zeroth-order generalized conditional gradient algorithm (0-GCG)

Theorems & Definitions (27)

  • Proposition 1
  • Definition 1
  • Definition 2: goldstein1977optimization
  • Definition 3: zhang2020complexity
  • Lemma 1: Proposition 2.3 of lin2022gradient
  • Definition 4
  • Definition 5
  • Proposition 2
  • Definition 6
  • Proposition 3
  • ...and 17 more