Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity

William de Vazelhes; Hualin Zhang; Huimin Wu; Xiao-Tong Yuan; Bin Gu

Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity

William de Vazelhes, Hualin Zhang, Huimin Wu, Xiao-Tong Yuan, Bin Gu

TL;DR

This paper proposes a new stochastic zeroth-order gradient hard-thresholding (SZOHT) algorithm with a general ZO gradient estimator powered by a novel random support sampling and reveals a conflict between the deviation of ZO estimators and the expansivity of the hard- Thresholding operator.

Abstract

$\ell_0$ constrained optimization is prevalent in machine learning, particularly for high-dimensional problems, because it is a fundamental approach to achieve sparse learning. Hard-thresholding gradient descent is a dominant technique to solve this problem. However, first-order gradients of the objective function may be either unavailable or expensive to calculate in a lot of real-world problems, where zeroth-order (ZO) gradients could be a good surrogate. Unfortunately, whether ZO gradients can work with the hard-thresholding operator is still an unsolved problem. To solve this puzzle, in this paper, we focus on the $\ell_0$ constrained black-box stochastic optimization problems, and propose a new stochastic zeroth-order gradient hard-thresholding (SZOHT) algorithm with a general ZO gradient estimator powered by a novel random support sampling. We provide the convergence analysis of SZOHT under standard assumptions. Importantly, we reveal a conflict between the deviation of ZO estimators and the expansivity of the hard-thresholding operator, and provide a theoretical minimal value of the number of random directions in ZO gradients. In addition, we find that the query complexity of SZOHT is independent or weakly dependent on the dimensionality under different settings. Finally, we illustrate the utility of our method on a portfolio optimization problem as well as black-box adversarial attacks.

Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity

TL;DR

Abstract

constrained optimization is prevalent in machine learning, particularly for high-dimensional problems, because it is a fundamental approach to achieve sparse learning. Hard-thresholding gradient descent is a dominant technique to solve this problem. However, first-order gradients of the objective function may be either unavailable or expensive to calculate in a lot of real-world problems, where zeroth-order (ZO) gradients could be a good surrogate. Unfortunately, whether ZO gradients can work with the hard-thresholding operator is still an unsolved problem. To solve this puzzle, in this paper, we focus on the

constrained black-box stochastic optimization problems, and propose a new stochastic zeroth-order gradient hard-thresholding (SZOHT) algorithm with a general ZO gradient estimator powered by a novel random support sampling. We provide the convergence analysis of SZOHT under standard assumptions. Importantly, we reveal a conflict between the deviation of ZO estimators and the expansivity of the hard-thresholding operator, and provide a theoretical minimal value of the number of random directions in ZO gradients. In addition, we find that the query complexity of SZOHT is independent or weakly dependent on the dimensionality under different settings. Finally, we illustrate the utility of our method on a portfolio optimization problem as well as black-box adversarial attacks.

Paper Structure (35 sections, 12 theorems, 112 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 35 sections, 12 theorems, 112 equations, 10 figures, 1 table, 1 algorithm.

Introduction
Contributions.
Preliminaries
Algorithm
Random support Zeroth-Order estimate
SZOHT Algorithm
Convergence analysis
Weak/non dependence on dimensionality of the query complexity.
Experiments
Sensitivity analysis
Baselines
Applications
Sparse asset risk management
Few pixels adversarial attacks
Results and Discussion
...and 20 more sections

Key Result

Proposition 1

(Proof in Appendix sec:proof_final_zo ) Let us consider any support $F\subset [d]$ of size $s$ ($|F|=s$). For the Z0 gradient estimator in eq:zoest, with $q$ random directions, and random supports of size $s_2$, and assuming that each $f_{\bm{\xi}}$ is $(L_{s_2}, s_2)$-RSS, we have, with $\hat{\nabl

Figures (10)

Figure 1: Conflict between the hard-thresholding operator and the zeroth-order estimate.
Figure 2: Sensitivity analysis
Figure 3: $f(\bm{x})$ vs. # queries (asset management)
Figure 4: $f(\bm{x})$ vs. # queries (adversarial attack)
Figure 5: $\nabla f(x)$ and $\hat{\nabla} f(x)$ and their projections $\nabla_F f(x)$ and $\hat{\nabla}_F f(x)$ onto $F$
...and 5 more figures

Theorems & Definitions (29)

Remark 1
Proposition 1
Theorem 1
Remark 2: System error
Remark 3: Generality
Remark 4: Some necessary condition on $q$, proof in \ref{['sec:firstcond']}
Corollary 1: RSS $f_{\bm{\xi}}$, proof in Appendix \ref{['sec:proof_specialq']}
Corollary 2: Smooth $f_{\bm{\xi}}$, proof in Appendix \ref{['sec:proof_s2d']})
Lemma B.1: Sykora2005 (10)
proof
...and 19 more

Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity

TL;DR

Abstract

Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (29)