Gradient-Free Method for Heavily Constrained Nonconvex Optimization

Wanli Shi; Hongchang Gao; Bin Gu

Gradient-Free Method for Heavily Constrained Nonconvex Optimization

Wanli Shi, Hongchang Gao, Bin Gu

TL;DR

A doubly stochastic zeroth-order gradient method (DSZOG) with momentum method and adaptive step size is proposed and it is proved DSZOG can converge to the $\epsilon$-stationary point of the constrained problem.

Abstract

Zeroth-order (ZO) method has been shown to be a powerful method for solving the optimization problem where explicit expression of the gradients is difficult or infeasible to obtain. Recently, due to the practical value of the constrained problems, a lot of ZO Frank-Wolfe or projected ZO methods have been proposed. However, in many applications, we may have a very large number of nonconvex white/black-box constraints, which makes the existing zeroth-order methods extremely inefficient (or even not working) since they need to inquire function value of all the constraints and project the solution to the complicated feasible set. In this paper, to solve the nonconvex problem with a large number of white/black-box constraints, we proposed a doubly stochastic zeroth-order gradient method (DSZOG) with momentum method and adaptive step size. Theoretically, we prove DSZOG can converge to the $ε$-stationary point of the constrained problem. Experimental results in two applications demonstrate the superiority of our method in terms of training time and accuracy compared with other ZO methods for the constrained problem.

Gradient-Free Method for Heavily Constrained Nonconvex Optimization

TL;DR

A doubly stochastic zeroth-order gradient method (DSZOG) with momentum method and adaptive step size is proposed and it is proved DSZOG can converge to the

-stationary point of the constrained problem.

Abstract

-stationary point of the constrained problem. Experimental results in two applications demonstrate the superiority of our method in terms of training time and accuracy compared with other ZO methods for the constrained problem.

Paper Structure (28 sections, 6 theorems, 57 equations, 3 figures, 12 tables, 1 algorithm)

This paper contains 28 sections, 6 theorems, 57 equations, 3 figures, 12 tables, 1 algorithm.

Introduction
Related Works
Zeroth-Order Methods
Variance Reduction and Momentum Methods
Preliminaries
Problem Setting
Reformulate the Constrained Problem
Proposed Method
Doubly Stochastic Zeroth-order Gradient Method
Momentum and Adaptive Step Size
Convergence Analysis
Stationary point
Convergence Rate of the Accelerated Method
Experiments
Experimental Setup
...and 13 more sections

Key Result

Proposition 5.5

If Assumption assum:penalty_function holds, $\sqrt{\dfrac{2m\epsilon^2+2m^2\lambda^2}{\beta^2}}\leq\epsilon_2^2$ and $(\boldsymbol{w}^*,\boldsymbol{p}^*)$ is the $\epsilon$-stationary point defined in Definition def:minimax of the problem $\min_{\boldsymbol{w}}\max_{\boldsymbol{p}\in\Delta^m}\mathca

Figures (3)

Figure 1: Test accuracy against training time of all the methods in classification with pairwise constraints (We stop the algorithms if the training time is more than 10000 seconds).
Figure 2: Test accuracy against training time of all the methods in classification with fairness constraints (We stop the algorithms if the training time is more than 10000 seconds).
Figure 3: Performance of our method in fairness with kernel method.

Theorems & Definitions (18)

Definition 5.2
Definition 5.3
Definition 5.4
Proposition 5.5
Definition 5.6
Proposition 5.7
Remark 5.8
Lemma 5.11
Lemma 5.12
Lemma 5.13
...and 8 more

Gradient-Free Method for Heavily Constrained Nonconvex Optimization

TL;DR

Abstract

Gradient-Free Method for Heavily Constrained Nonconvex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (18)