Table of Contents
Fetching ...

Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

Huaiyi Mu, Yujie Tang, Zhongkui Li

TL;DR

A novel variance-reduced gradient estimator is proposed, which randomly renovates one orthogonal direction of the true gradient in each iteration while leveraging historical snapshots for variance correction, to address the trade-off between convergence rate and sampling cost per zeroth-order gradient estimation.

Abstract

This paper investigates distributed zeroth-order optimization for smooth nonconvex problems. We propose a novel variance-reduced gradient estimator, which randomly renovates one orthogonal direction of the true gradient in each iteration while leveraging historical snapshots for variance correction. By integrating this estimator with gradient tracking mechanism, we address the trade-off between convergence rate and sampling cost per zeroth-order gradient estimation that exists in current zeroth-order distributed optimization algorithms, which rely on either the 2-point or $2d$-point gradient estimators. We derive a convergence rate of $\mathcal{O}(d^{\frac{5}{2}}/m)$ for smooth nonconvex functions in terms of sampling number $m$ and problem dimension $d$. Numerical simulations comparing our algorithm with existing methods confirm the effectiveness and efficiency of the proposed gradient estimator.

Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

TL;DR

A novel variance-reduced gradient estimator is proposed, which randomly renovates one orthogonal direction of the true gradient in each iteration while leveraging historical snapshots for variance correction, to address the trade-off between convergence rate and sampling cost per zeroth-order gradient estimation.

Abstract

This paper investigates distributed zeroth-order optimization for smooth nonconvex problems. We propose a novel variance-reduced gradient estimator, which randomly renovates one orthogonal direction of the true gradient in each iteration while leveraging historical snapshots for variance correction. By integrating this estimator with gradient tracking mechanism, we address the trade-off between convergence rate and sampling cost per zeroth-order gradient estimation that exists in current zeroth-order distributed optimization algorithms, which rely on either the 2-point or -point gradient estimators. We derive a convergence rate of for smooth nonconvex functions in terms of sampling number and problem dimension . Numerical simulations comparing our algorithm with existing methods confirm the effectiveness and efficiency of the proposed gradient estimator.
Paper Structure (14 sections, 11 theorems, 85 equations, 3 figures, 1 algorithm)

This paper contains 14 sections, 11 theorems, 85 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

Under Assumption assumption_smooth_f^*, suppose the parameters of Algorithm main_algorithm satisfy $p\in(\frac{1-\sigma^2}{d} ,1]$, $\sum_{\tau=0}^{\infty} (du_i^{\tau})^2 < \infty$, $u_i^k$ is non-increasing, and Then we have and where $R_0 = \frac{d}{1-\sigma^2}E_x^0$, $R_u = \frac{1}{p}(du_i^0)^2 + \sum_{\tau=1}^{\infty} (du_i^{\tau})^2$.

Figures (3)

  • Figure 1: Convergence of Algorithm 1, ZONE-M with J =100, DGD-2p, GT-$2d$.
  • Figure 2: Convergence of Algorithm 1 under probability $p$ = 0.2, 0.5, 0.8, and 1.
  • Figure 3: Convergence of Algorithm 1 with different dimension $d$ = 30, 100, 200, and 300.

Theorems & Definitions (23)

  • Theorem 1
  • proof
  • Remark 1
  • Remark 2
  • Corollary 1
  • proof
  • Remark 3
  • Lemma 1
  • proof
  • Remark 4
  • ...and 13 more