Table of Contents
Fetching ...

Compressed Momentum-based Single-Point Zero-Order Algorithm for Stochastic Distributed Nonconvex Optimization

Linjing Chen, Antai Xie, Xinlei Yi, Xiaoqiang Ren, Xiaofan Wang

TL;DR

This work addresses stochastic distributed nonconvex optimization when explicit gradients are unavailable and communication is bandwidth-limited. It introduces the Compressed Momentum-based Single-Point Zeroth-Order (CMSPZO) algorithm, which blends momentum, one-point zeroth-order gradient estimation, and compressed inter-agent communication over a connected graph with weight matrix $\mathbf{W}$ and spectral gap $\delta$. Under appropriate assumptions and parameter choices, CMSPZO achieves convergence to the exact solution with diminishing step sizes at rate $O(1/\sqrt[4]{T})$ and convergence to a neighborhood of a stationary point with fixed steps at rate $O(1/\sqrt{T})$, while controlling consensus and compression errors. Empirical results on a distributed nonconvex logistic regression task demonstrate reduced communication than baselines while maintaining competitive convergence, highlighting practical efficiency in bandwidth-limited networks.

Abstract

This paper studies a compressed momentum-based single-point zeroth-order algorithm for stochastic distributed nonconvex optimization, aiming to alleviate communication overhead and address the unavailability of explicit gradient information. In the developed framework, each agent has access only to stochastic zeroth-order information of its local objective function, performs local stochastic updates with momentum, and exchanges compressed updates with its neighbors. We theoretically prove that the proposed algorithm can achieve the exact solution with diminishing step sizes and can achieve a sublinear convergence rate towards a neighborhood of the stationary point with fixed step sizes. Numerical experiments validate the effectiveness and communication efficiency of the proposed algorithm.

Compressed Momentum-based Single-Point Zero-Order Algorithm for Stochastic Distributed Nonconvex Optimization

TL;DR

This work addresses stochastic distributed nonconvex optimization when explicit gradients are unavailable and communication is bandwidth-limited. It introduces the Compressed Momentum-based Single-Point Zeroth-Order (CMSPZO) algorithm, which blends momentum, one-point zeroth-order gradient estimation, and compressed inter-agent communication over a connected graph with weight matrix and spectral gap . Under appropriate assumptions and parameter choices, CMSPZO achieves convergence to the exact solution with diminishing step sizes at rate and convergence to a neighborhood of a stationary point with fixed steps at rate , while controlling consensus and compression errors. Empirical results on a distributed nonconvex logistic regression task demonstrate reduced communication than baselines while maintaining competitive convergence, highlighting practical efficiency in bandwidth-limited networks.

Abstract

This paper studies a compressed momentum-based single-point zeroth-order algorithm for stochastic distributed nonconvex optimization, aiming to alleviate communication overhead and address the unavailability of explicit gradient information. In the developed framework, each agent has access only to stochastic zeroth-order information of its local objective function, performs local stochastic updates with momentum, and exchanges compressed updates with its neighbors. We theoretically prove that the proposed algorithm can achieve the exact solution with diminishing step sizes and can achieve a sublinear convergence rate towards a neighborhood of the stationary point with fixed step sizes. Numerical experiments validate the effectiveness and communication efficiency of the proposed algorithm.

Paper Structure

This paper contains 17 sections, 8 theorems, 146 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

(Proposition 3.3. of mhanna2023single) Under Assumptions 4 and 5, then, $g_{i,t}$ is a biased estimator of the agent's gradient $\nabla {F_i}({x_{i,t}}),\forall i \in {\cal V}$ for every $t \ge 0$, i.e., with where ${{b_{i,t}}} \in \mathbb{R}^d$ is the bias with respect to the true gradient, and $\upsilon \in [x_{i,t},\, x_{i,t} + \gamma_g u_{i,t}]$. Moreover, ${{\cal H}_t} = \{ {{\mathbf{x}_0

Figures (2)

  • Figure 1: Evolutions of $P(T)$ with respect to the number of iterations.
  • Figure 2: Evolutions of $P(T)$ with respect to the number of transmitted bits.

Theorems & Definitions (12)

  • Remark 1
  • Lemma 1
  • Definition 1: Compression
  • Remark 2
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Theorem 1
  • Remark 3
  • Corollary 1
  • ...and 2 more