Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms

Bin Gu; Xiyuan Wei; Hualin Zhang; Yi Chang; Heng Huang

Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms

Bin Gu, Xiyuan Wei, Hualin Zhang, Yi Chang, Heng Huang

TL;DR

A ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate and two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property.

Abstract

Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance reduced ZO proximal algorithms have been proposed to speed up ZO optimization for non-smooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces bigger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only $\mathcal{O}(1)$ computation, which is significantly less than $\mathcal{O}(d)$ computation of the coordinated ZO estimator, with $d$ being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property which can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization which can automatically derive the convergence results for convex and non-convex problems respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from $\mathcal{O}\left(\min\{\frac{dn^{1/2}}{ε^2}, \frac{d}{ε^3}\}\right)$ to $\tilde{\mathcal{O}}\left(\frac{n+d}{ε^2}\right)$ under $d > n^{\frac{1}{2}}$ for non-convex problems, and from $\mathcal{O}\left(\frac{d}{ε^2}\right)$ to $\tilde{\mathcal{O}}\left(n\log\frac{1}ε+\frac{d}ε\right)$ for convex problems.

Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms

TL;DR

Abstract

computation, which is significantly less than

computation of the coordinated ZO estimator, with

being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property which can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization which can automatically derive the convergence results for convex and non-convex problems respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from

under

for non-convex problems, and from

for convex problems.

Paper Structure (15 sections, 24 theorems, 113 equations, 4 figures, 2 tables, 4 algorithms)

This paper contains 15 sections, 24 theorems, 113 equations, 4 figures, 2 tables, 4 algorithms.

Introduction
Related Work
Preliminaries
Assumptions
ZO Gradient Estimation
Zeroth-Order Reduction Frameworks
AdaptRdct-C
AdaptRdct-NC
Applications on Lightweight Variance Reduced ZO Proximal Algorithms
Lightweight Variance Reduced ZO Proximal Algorithms
Applying Reduction Frameworks on ZOR-ProxSVRG and ZOR-ProxSAGA
Experimental Results
Generation of Black-Box Adversarial Examples
Convex and Nonconvex Logistic Regression
Conclusion

Key Result

Theorem 1

Suppose Assumptions a61, a4 and a2 are satisfied. Let $\mathbf{x}_0$ be an initial point such that $F(\mathbf{x}_0) - F(\mathbf{x}^*) \leq \Delta$, and $||\mathbf{x}_0 - \mathbf{x}^*||^2 \leq \Theta$. For Algorithmalgo2, if the inner algorithm $\mathcal{A}$ satisfies the ZOOD property, we have

Figures (4)

Figure 1: Comparison of coordinated and random ZO estimators for variance reduced ZO proximal gradient algorithms.
Figure 2: Principle of our AdaptRdct-C and AdaptRdct-NC.
Figure 3: Comparison of black-box attack methods on three well-trained DNNs, with $\sigma =$1e-3.
Figure 4: Comparison of different ZO algorithms for logistic regression problems. (a)-(c) Convex. (d)-(f) Non-convex. (a) and (d) are plotted with residue error (i.e., $F(x)-F(x^*)$) in the log-scale.

Theorems & Definitions (50)

Definition 1
Definition 2: $\epsilon$-Stationary Point
Definition 3: Moreau Envelope and Proximal Mapping
Remark 1
Definition 4: ZOOD Property
Theorem 1
Remark 2
Corollary 1
Theorem 2
Remark 3
...and 40 more

Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms

TL;DR

Abstract

Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (50)