Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

Huaiyi Mu; Yujie Tang; Zhongkui Li

Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

Huaiyi Mu, Yujie Tang, Zhongkui Li

TL;DR

A novel variance-reduced gradient estimator is proposed, which randomly renovates one orthogonal direction of the true gradient in each iteration while leveraging historical snapshots for variance correction, to address the trade-off between convergence rate and sampling cost per zeroth-order gradient estimation.

Abstract

This paper investigates distributed zeroth-order optimization for smooth nonconvex problems. We propose a novel variance-reduced gradient estimator, which randomly renovates one orthogonal direction of the true gradient in each iteration while leveraging historical snapshots for variance correction. By integrating this estimator with gradient tracking mechanism, we address the trade-off between convergence rate and sampling cost per zeroth-order gradient estimation that exists in current zeroth-order distributed optimization algorithms, which rely on either the 2-point or $2d$-point gradient estimators. We derive a convergence rate of $\mathcal{O}(d^{\frac{5}{2}}/m)$ for smooth nonconvex functions in terms of sampling number $m$ and problem dimension $d$. Numerical simulations comparing our algorithm with existing methods confirm the effectiveness and efficiency of the proposed gradient estimator.

Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

TL;DR

Abstract

-point gradient estimators. We derive a convergence rate of

for smooth nonconvex functions in terms of sampling number

and problem dimension

. Numerical simulations comparing our algorithm with existing methods confirm the effectiveness and efficiency of the proposed gradient estimator.

Paper Structure (14 sections, 11 theorems, 85 equations, 3 figures, 1 algorithm)

This paper contains 14 sections, 11 theorems, 85 equations, 3 figures, 1 algorithm.

INTRODUCTION
Formulation And Preliminaries
Problem Formulation
Preliminaries on Distributed Zeroth-Order Optimization
Our Algorithm
Main Results
Outline of Convergence Analysis
Bounding the Variance of VR-GE
Proof Sketch of Theorem \ref{['theorem1']}
Simulation
Comparison with Other Algorithms
Comparison of Algorithm 1 under Different Probabilities
Comparison of Algorithm 1 under Different Dimensions
Conclusion

Key Result

Theorem 1

Under Assumption assumption_smooth_f^*, suppose the parameters of Algorithm main_algorithm satisfy $p\in(\frac{1-\sigma^2}{d} ,1]$, $\sum_{\tau=0}^{\infty} (du_i^{\tau})^2 < \infty$, $u_i^k$ is non-increasing, and Then we have and where $R_0 = \frac{d}{1-\sigma^2}E_x^0$, $R_u = \frac{1}{p}(du_i^0)^2 + \sum_{\tau=1}^{\infty} (du_i^{\tau})^2$.

Figures (3)

Figure 1: Convergence of Algorithm 1, ZONE-M with J =100, DGD-2p, GT-$2d$.
Figure 2: Convergence of Algorithm 1 under probability $p$ = 0.2, 0.5, 0.8, and 1.
Figure 3: Convergence of Algorithm 1 with different dimension $d$ = 30, 100, 200, and 300.

Theorems & Definitions (23)

Theorem 1
proof
Remark 1
Remark 2
Corollary 1
proof
Remark 3
Lemma 1
proof
Remark 4
...and 13 more

Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

TL;DR

Abstract

Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (23)