Table of Contents
Fetching ...

Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

Hao Di, Haishan Ye, Yueling Zhang, Xiangyu Chang, Guang Dai, Ivor W. Tsang

TL;DR

The paper tackles zeroth-order optimization for composite finite-sum objectives by addressing the persistent coordinate-wise variance that arises when only random gradient estimates are available. It introduces Zeroth-order Proximal Double Variance Reduction (ZPDVR), which combines Gaussian smoothing with an averaging-based gradient estimator and a loopless SVRG-style update to reduce both sampling and coordinate-wise variance without needing full first-order information. The authors prove linear convergence with an optimal zeroth-order query complexity of $O(d(n+\kappa)\log(1/\epsilon))$ and show that only $O(1)$ zeroth-order queries are needed per iteration in expectation. Empirical results on binary classification tasks demonstrate fast convergence and superior performance relative to baselines, supporting ZPDVR's practicality for high-dimensional, black-box optimization.

Abstract

Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands O(d) function evaluations (d is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, ZPDVR relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation $\mathcal{O}(1)$ times per iteration, and achieves the optimal $\mathcal{O}(d(n + κ)\log (\frac{1}ε))$ SZO query complexity in the strongly convex and smooth setting, where $κ$ represents the condition number and $ε$ is the desired accuracy. Empirical results validate ZPDVR's linear convergence and demonstrate its superior performance over other related methods.

Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

TL;DR

The paper tackles zeroth-order optimization for composite finite-sum objectives by addressing the persistent coordinate-wise variance that arises when only random gradient estimates are available. It introduces Zeroth-order Proximal Double Variance Reduction (ZPDVR), which combines Gaussian smoothing with an averaging-based gradient estimator and a loopless SVRG-style update to reduce both sampling and coordinate-wise variance without needing full first-order information. The authors prove linear convergence with an optimal zeroth-order query complexity of and show that only zeroth-order queries are needed per iteration in expectation. Empirical results on binary classification tasks demonstrate fast convergence and superior performance relative to baselines, supporting ZPDVR's practicality for high-dimensional, black-box optimization.

Abstract

Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands O(d) function evaluations (d is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, ZPDVR relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation times per iteration, and achieves the optimal SZO query complexity in the strongly convex and smooth setting, where represents the condition number and is the desired accuracy. Empirical results validate ZPDVR's linear convergence and demonstrate its superior performance over other related methods.
Paper Structure (11 sections, 16 theorems, 73 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 11 sections, 16 theorems, 73 equations, 1 figure, 2 tables, 1 algorithm.

Key Result

Lemma 3.1

Let the random vector $u$ drawn from the multivariate Gaussian distribution ${\mathcal{N}}(0, I_d)$. For the $L$-smooth function $f_i$ and any $x\in {\mathbb R}^d$, $i\in [n]$, the estimator in Eq.eq:stochastic_directional_derivative satisfies: and its expectation w.r.t. $u$ is where $s_i(x, u)$ is a function of $x$ and $u$ within the range of $[0, 1]$, and $\tau_i(x, u)$ is the error term with

Figures (1)

  • Figure 1: Comparison of different zeroth-order methods for the loss residual $F(x) - F(x^*)$ versus the number of SZO. The $y$ axis is on a logarithmic scale and the $x$ label is the number of SZO divided by $n*d$.

Theorems & Definitions (30)

  • Lemma 3.1
  • Corollary 3.2
  • Remark 3.3
  • Remark 3.4
  • Lemma 3.5
  • Remark 3.6
  • Lemma 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Lemma 4.4
  • ...and 20 more