Table of Contents
Fetching ...

Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms

Bin Gu, Xiyuan Wei, Hualin Zhang, Yi Chang, Heng Huang

TL;DR

A ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate and two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property.

Abstract

Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance reduced ZO proximal algorithms have been proposed to speed up ZO optimization for non-smooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces bigger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only $\mathcal{O}(1)$ computation, which is significantly less than $\mathcal{O}(d)$ computation of the coordinated ZO estimator, with $d$ being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property which can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization which can automatically derive the convergence results for convex and non-convex problems respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from $\mathcal{O}\left(\min\{\frac{dn^{1/2}}{ε^2}, \frac{d}{ε^3}\}\right)$ to $\tilde{\mathcal{O}}\left(\frac{n+d}{ε^2}\right)$ under $d > n^{\frac{1}{2}}$ for non-convex problems, and from $\mathcal{O}\left(\frac{d}{ε^2}\right)$ to $\tilde{\mathcal{O}}\left(n\log\frac{1}ε+\frac{d}ε\right)$ for convex problems.

Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms

TL;DR

A ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate and two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property.

Abstract

Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance reduced ZO proximal algorithms have been proposed to speed up ZO optimization for non-smooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces bigger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only computation, which is significantly less than computation of the coordinated ZO estimator, with being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property which can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization which can automatically derive the convergence results for convex and non-convex problems respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from to under for non-convex problems, and from to for convex problems.
Paper Structure (15 sections, 24 theorems, 113 equations, 4 figures, 2 tables, 4 algorithms)

This paper contains 15 sections, 24 theorems, 113 equations, 4 figures, 2 tables, 4 algorithms.

Key Result

Theorem 1

Suppose Assumptions a61, a4 and a2 are satisfied. Let $\mathbf{x}_0$ be an initial point such that $F(\mathbf{x}_0) - F(\mathbf{x}^*) \leq \Delta$, and $||\mathbf{x}_0 - \mathbf{x}^*||^2 \leq \Theta$. For Algorithmalgo2, if the inner algorithm $\mathcal{A}$ satisfies the ZOOD property, we have

Figures (4)

  • Figure 1: Comparison of coordinated and random ZO estimators for variance reduced ZO proximal gradient algorithms.
  • Figure 2: Principle of our AdaptRdct-C and AdaptRdct-NC.
  • Figure 3: Comparison of black-box attack methods on three well-trained DNNs, with $\sigma =$1e-3.
  • Figure 4: Comparison of different ZO algorithms for logistic regression problems. (a)-(c) Convex. (d)-(f) Non-convex. (a) and (d) are plotted with residue error (i.e., $F(x)-F(x^*)$) in the log-scale.

Theorems & Definitions (50)

  • Definition 1
  • Definition 2: $\epsilon$-Stationary Point
  • Definition 3: Moreau Envelope and Proximal Mapping
  • Remark 1
  • Definition 4: ZOOD Property
  • Theorem 1
  • Remark 2
  • Corollary 1
  • Theorem 2
  • Remark 3
  • ...and 40 more