Table of Contents
Fetching ...

Optimization on black-box function by parameter-shift rule

Vu Tuan Hai

TL;DR

The paper tackles black-box optimization where the parameter–outcome relationship is opaque and traditional gradient access is unavailable. It adapts the parameter-shift rule (PSR), originally from quantum computing, into a zeroth-order gradient estimation method to reduce query counts and achieve favorable computational scaling. The authors apply the approach to a perceptron and to simple nonlinear functions, demonstrating high-fidelity gradient estimates that closely match analytic gradients. They discuss strategies for selecting PSR parameters $(r,\epsilon)$, including grid-search and potential one-dimensional reductions when $r=h(\epsilon)$, and outline future work for broader practical deployment.

Abstract

Machine learning has been widely applied in many aspects, but training a machine learning model is increasingly difficult. There are more optimization problems named "black-box" where the relationship between model parameters and outcomes is uncertain or complex to trace. Currently, optimizing black-box models that need a large number of query observations and parameters becomes difficult. To overcome the drawbacks of the existing algorithms, in this study, we propose a zeroth-order method that originally came from quantum computing called the parameter-shift rule, which has used a lesser number of parameters than previous methods.

Optimization on black-box function by parameter-shift rule

TL;DR

The paper tackles black-box optimization where the parameter–outcome relationship is opaque and traditional gradient access is unavailable. It adapts the parameter-shift rule (PSR), originally from quantum computing, into a zeroth-order gradient estimation method to reduce query counts and achieve favorable computational scaling. The authors apply the approach to a perceptron and to simple nonlinear functions, demonstrating high-fidelity gradient estimates that closely match analytic gradients. They discuss strategies for selecting PSR parameters , including grid-search and potential one-dimensional reductions when , and outline future work for broader practical deployment.

Abstract

Machine learning has been widely applied in many aspects, but training a machine learning model is increasingly difficult. There are more optimization problems named "black-box" where the relationship between model parameters and outcomes is uncertain or complex to trace. Currently, optimizing black-box models that need a large number of query observations and parameters becomes difficult. To overcome the drawbacks of the existing algorithms, in this study, we propose a zeroth-order method that originally came from quantum computing called the parameter-shift rule, which has used a lesser number of parameters than previous methods.

Paper Structure

This paper contains 10 sections, 16 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: The perceptron with activation function $g(w,b)$ which have $n$ nodes in the input layer and one node in the output layer.
  • Figure 2: The parameter space of $\{n_R,n_{\mathcal{E}}\}$$T$ when dealing with $f(x)=x^2+\cos(x+2)$. The $x$ and $y$ axes represent $n_R$ and $n_{\epsilon}$, respectively. Each point in $T$ generates a parameter space of $\{r,\epsilon\}$$\mathbb{S}$. The value of each data point represents the minimum error that some grid points in $\mathbb{S}$ can achieve. Our objective is to minimize the error as much as possible.
  • Figure 3: The analytic gradient, the approximate gradient computed from the PSR, and the distance error value (right y-axis) between them on two non-linear functions.
  • Figure 4: The distance error when calculating the gradient of the perceptron.