Table of Contents
Fetching ...

Accelerating Single-Point Zeroth-Order Optimization with Regression-Based Gradient Surrogates

Xin Chen, Zhaolin Ren

TL;DR

This work proposes a novel yet simple SZO framework termed regression-based SZO (ReSZO), which substantially enhances the convergence rate and constructs a surrogate function via regression using historical function evaluations and employs the gradient of this surrogate function for iterative updates.

Abstract

Zeroth-order optimization (ZO) is widely used for solving black-box optimization and control problems. In particular, single-point ZO (SZO) is well-suited to online or dynamic problem settings due to its requirement of only a single function evaluation per iteration. However, SZO suffers from high gradient estimation variance and slow convergence, which severely limit its practical applicability. To overcome this limitation, we propose a novel yet simple SZO framework termed regression-based SZO (ReSZO), which substantially enhances the convergence rate. Specifically, ReSZO constructs a surrogate function via regression using historical function evaluations and employs the gradient of this surrogate function for iterative updates. Two instantiations of ReSZO, which fit linear and quadratic surrogate functions respectively, are introduced. Moreover, we provide a non-asymptotic convergence analysis for the linear instantiation of ReSZO, showing that its convergence rates are comparable to those of two-point ZO methods. Extensive numerical experiments demonstrate that ReSZO empirically converges two to three times faster than two-point ZO in terms of function query complexity.

Accelerating Single-Point Zeroth-Order Optimization with Regression-Based Gradient Surrogates

TL;DR

This work proposes a novel yet simple SZO framework termed regression-based SZO (ReSZO), which substantially enhances the convergence rate and constructs a surrogate function via regression using historical function evaluations and employs the gradient of this surrogate function for iterative updates.

Abstract

Zeroth-order optimization (ZO) is widely used for solving black-box optimization and control problems. In particular, single-point ZO (SZO) is well-suited to online or dynamic problem settings due to its requirement of only a single function evaluation per iteration. However, SZO suffers from high gradient estimation variance and slow convergence, which severely limit its practical applicability. To overcome this limitation, we propose a novel yet simple SZO framework termed regression-based SZO (ReSZO), which substantially enhances the convergence rate. Specifically, ReSZO constructs a surrogate function via regression using historical function evaluations and employs the gradient of this surrogate function for iterative updates. Two instantiations of ReSZO, which fit linear and quadratic surrogate functions respectively, are introduced. Moreover, we provide a non-asymptotic convergence analysis for the linear instantiation of ReSZO, showing that its convergence rates are comparable to those of two-point ZO methods. Extensive numerical experiments demonstrate that ReSZO empirically converges two to three times faster than two-point ZO in terms of function query complexity.

Paper Structure

This paper contains 17 sections, 11 theorems, 91 equations, 6 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

Suppose that Assumptions assumption:f_regularity and assumption:key_analysis_condition hold, and that $m > d$. Suppose that $\tilde{\eta}$ is picked such that Condition condition:eta_tilde is satisfied. Then, for any $0 < c < 1$, by choosing $\eta = C\frac{1}{m C_d L}$ for some sufficiently small ab for any $t \geq m$.

Figures (6)

  • Figure 1: The convergence and 80% confidence intervals (CI) of TZO \ref{['eq:tzo']}, RSZO \ref{['eq:rfszo']}, L-ReSZO (Algorithm \ref{['alg:lReSZO']}) and Q-ReSZO \ref{['eq:2update']} for solving ridge regression \ref{['eq:ridge']}, logistic regression \ref{['eq:logis']}, Rosenbrock function minimization \ref{['eq:rosen']}, and neural network training \ref{['eq:nn']}.
  • Figure 2: Convergence of L-ReSZO and Q-ReSZO for solving the ridge regression problem \ref{['eq:ridge']} under different levels of perturbations and using the adaptive smoothing radius scheme \ref{['eq:adaptive']}.
  • Figure 3: Empirical distribution of the ratio $C_d$ for dimensions $d=100,400,900$ for ridge regression problem in \ref{['eq:ridge']}. These plots are the sources from which the max/99-percentile/mean in Table \ref{['tab:ridge_Cd']} are computed.
  • Figure 4: Convergence of L-ReSZO on the ridge‐regression problem in \ref{['eq:ridge']} for dimensions $d=100,400,900$. The $C_d$ ratios shown in Figure \ref{['fig:cd_plots_ridge']} are based on the algorithmic trajectories here.
  • Figure 5: Empirical distribution of the ratio $C_d$ for dimensions $d=100,400,900$ for logistic regression problem in \ref{['eq:logis']}. These plots are the sources from which the max/99-percentile/mean in Table \ref{['tab:logistic_Cd']} were computed.
  • ...and 1 more figures

Theorems & Definitions (21)

  • Remark 1
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Theorem 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • proof : Theoretical justification for Assumption \ref{['assumption:Dt_smallest_singular_value']}
  • ...and 11 more