Heterogeneous Distributed Zeroth-Order Nonconvex Optimization with Communication Compression

Haonan Wang; Xinlei Yi; Yiguang Hong; Minghui Liwang

Heterogeneous Distributed Zeroth-Order Nonconvex Optimization with Communication Compression

Haonan Wang, Xinlei Yi, Yiguang Hong, Minghui Liwang

TL;DR

The paper addresses distributed zeroth-order optimization in heterogeneous networks by introducing HEDZOC, a two-point-gradient estimator with communication compression that does not rely on data homogeneity, per-iteration $ ext{O}(pn)$ evaluations, or a known PL constant. It develops a Lyapunov-based analysis that bounds the gradient estimator variance via the optimality gap, enabling convergence under general nonconvexity and PL conditions. The main results show linear speedup rates in $n$ across three regimes: general nonconvex, PL with unknown constant, and PL with known constant, with compression-induced overhead becoming negligible as the compressor approaches lossless transmission. Simulations on adversarial example generation with MNIST validate the theory, demonstrating strong convergence and significant communication savings even under substantial data heterogeneity. Overall, the work advances practical, scalable distributed zeroth-order optimization by removing classical restrictive assumptions while maintaining fast convergence and communication efficiency.

Abstract

Distributed zeroth-order optimization is increasingly applied in heterogeneous scenarios where agents possess distinct data distributions and objectives. This heterogeneity poses fundamental challenges for convergence analysis, as existing convergence analyses rely on relatively strong assumptions to ensure theoretical guarantees. Specifically, at least one of the following three assumptions is usually required: (i) data homogeneity across agents, (ii) $\mathcal{O}(pn)$ function evaluations per iteration with $p$ denoting the dimension and $n$ the number of agents, or (iii) the Polyak--Łojasiewicz (P--L) or strong convexity condition with a known corresponding constant. To overcome these limitations, we propose a Heterogeneous Distributed Zeroth-Order Compressed (HEDZOC) algorithm, which is based on a two-point zeroth-order gradient estimator and a general class of compressors. Without assuming data homogeneity, we develop the analysis covering three settings: general nonconvex functions, functions satisfying the P--L condition without knowing the P--L constant, and those with a known constant. To the best of our knowledge, the proposed HEDZOC algorithm is the first distributed zeroth-order method that establishes convergence without relying on the above three assumptions. Moreover, it achieves linear speedup convergence rate, which is comparable to state-of-the-art results attainable under data homogeneity and exact communication assumptions. Finally, experiments on heterogeneous adversarial example generation validate the theoretical results.

Heterogeneous Distributed Zeroth-Order Nonconvex Optimization with Communication Compression

TL;DR

evaluations, or a known PL constant. It develops a Lyapunov-based analysis that bounds the gradient estimator variance via the optimality gap, enabling convergence under general nonconvexity and PL conditions. The main results show linear speedup rates in

across three regimes: general nonconvex, PL with unknown constant, and PL with known constant, with compression-induced overhead becoming negligible as the compressor approaches lossless transmission. Simulations on adversarial example generation with MNIST validate the theory, demonstrating strong convergence and significant communication savings even under substantial data heterogeneity. Overall, the work advances practical, scalable distributed zeroth-order optimization by removing classical restrictive assumptions while maintaining fast convergence and communication efficiency.

Abstract

function evaluations per iteration with

denoting the dimension and

the number of agents, or (iii) the Polyak--Łojasiewicz (P--L) or strong convexity condition with a known corresponding constant. To overcome these limitations, we propose a Heterogeneous Distributed Zeroth-Order Compressed (HEDZOC) algorithm, which is based on a two-point zeroth-order gradient estimator and a general class of compressors. Without assuming data homogeneity, we develop the analysis covering three settings: general nonconvex functions, functions satisfying the P--L condition without knowing the P--L constant, and those with a known constant. To the best of our knowledge, the proposed HEDZOC algorithm is the first distributed zeroth-order method that establishes convergence without relying on the above three assumptions. Moreover, it achieves linear speedup convergence rate, which is comparable to state-of-the-art results attainable under data homogeneity and exact communication assumptions. Finally, experiments on heterogeneous adversarial example generation validate the theoretical results.

Paper Structure (31 sections, 20 theorems, 137 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 31 sections, 20 theorems, 137 equations, 3 figures, 1 table, 1 algorithm.

Introduction
Main Contributions
Organization and Notations
Existing Assumptions and Challenges
Data homogeneity
$\mathcal{O}(\mathit{pn})$ function evaluations
Polyak--Łojasiewicz condition / Strong Convexity
Problem Formulation
Algorithm Design
Preliminary Convergence Analysis
Lyapunov Analysis
Bound on the Optimality Gap
Main Results
General Nonconvex Setting
P--L Setting with Unknown Constant
...and 16 more sections

Key Result

Lemma 1

Under Assumption zerosg:ass:zeroth-smooth, let $\{{\mathbf{x}}_k\}$ be the sequence generated by Algorithm nonconvex:algorithm-pdgd. Then where $\check{\sigma}^2_2=2\ell f^*-\frac{2\ell }{n}\sum_{i=1}^{n}f_i^*\ge0$, ${\mathbf{g}}_k^z=\mathop{\mathrm{col}}\nolimits(g^z_{1,k},\dots,g^z_{n,k})$, $\bar{{\mathbf{g}}}^z_k={\mathbf{H}}{\mathbf{g}}^z_k$, $g^\mu_{i,k}=\nabla \hat{f}_{i}(x_{i,k},\mu_{i,k})

Figures (3)

Figure 1: Logical flow of of the preliminary convergence analysis and the proof of Theorem \ref{['Thm:nonconvex']}, where (i)--(iii) correspond to the three key techniques discussed in Section \ref{['Section:Proof']}.
Figure 2: Evolutions of attack loss with respect to the number of iterations.
Figure 3: Evolutions of attack loss with respect to the number of inter-agent communication bits.

Theorems & Definitions (43)

Remark 1
Remark 2
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Lemma 4
proof
...and 33 more

Heterogeneous Distributed Zeroth-Order Nonconvex Optimization with Communication Compression

TL;DR

Abstract

Heterogeneous Distributed Zeroth-Order Nonconvex Optimization with Communication Compression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (43)