CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence

Kun Huang; Shi Pu

CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence

Kun Huang, Shi Pu

TL;DR

This work tackles distributed optimization over networks with communication compression by introducing CEDAS, a compressed exact diffusion method with adaptive stepsizes. CEDAS achieves convergence rates comparable to centralized SGD for both smooth strongly convex and smooth nonconvex objectives under unbiased compression, while delivering the shortest transient times to reach these rates across cited graph topologies. The authors present a rigorous Lyapunov-based analysis, introduce new recursions to handle compression errors, and establish network-independent-type guarantees without requiring bounded gradient moments or gradient dissimilarity. Numerical experiments on logistic regression and neural networks corroborate the theoretical results, illustrating improved performance under communication constraints and varying network connectivity. Overall, CEDAS combines compression, diffusion, and adaptive stepping to yield practical, scalable, and fast-converging decentralized optimization with strong theoretical support and empirical validation.

Abstract

In this paper, we consider solving the distributed optimization problem over a multi-agent network under the communication restricted setting. We study a compressed decentralized stochastic gradient method, termed ``compressed exact diffusion with adaptive stepsizes (CEDAS)", and show the method asymptotically achieves comparable convergence rate as centralized { stochastic gradient descent (SGD)} for both smooth strongly convex objective functions and smooth nonconvex objective functions under unbiased compression operators. In particular, to our knowledge, CEDAS enjoys so far the shortest transient time (with respect to the graph specifics) for achieving the convergence rate of centralized SGD, which behaves as $\mathcal{O}(n{C^3}/(1-λ_2)^{2})$ under smooth strongly convex objective functions, and $\mathcal{O}(n^3{C^6}/(1-λ_2)^4)$ under smooth nonconvex objective functions, where $(1-λ_2)$ denotes the spectral gap of the mixing matrix, and $C>0$ is the compression-related parameter. In particular, CEDAS exhibits the shortest transient times when $C < \mathcal{O}(1/(1 - λ_2)^2)$, which is common in practice. Numerical experiments further demonstrate the effectiveness of the proposed algorithm.

CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence

TL;DR

Abstract

under smooth strongly convex objective functions, and

under smooth nonconvex objective functions, where

denotes the spectral gap of the mixing matrix, and

is the compression-related parameter. In particular, CEDAS exhibits the shortest transient times when

, which is common in practice. Numerical experiments further demonstrate the effectiveness of the proposed algorithm.

Paper Structure (31 sections, 23 theorems, 92 equations, 5 figures, 1 table, 3 algorithms)

This paper contains 31 sections, 23 theorems, 92 equations, 5 figures, 1 table, 3 algorithms.

Introduction
Related Works
Main Contribution
Notation
Organization
Setup
Assumptions
Algorithm
Preliminary Analysis
Convergence Analysis: Nonconvex Case
Convergence
Transient Time
Convergence Analysis: Strongly Convex Case
Convergence
Transient Time
...and 16 more sections

Key Result

Lemma 1

For any compressor $\mathcal{C}_1\in\mathbb{B}(\delta_1)$, we can choose a compressor $\mathcal{C}_2\in\mathbb{U}(C_2)$ so that an introduced compressor $\mathcal{C}: \mathbb{R}^p \rightarrow \mathbb{R}^p$ defined by $\mathcal{C}\left(x\right) := \mathcal{C}_1(x) + \mathcal{C}_2\left(x - \mathcal{C}

Figures (5)

Figure 1: Roadmap of the analysis.
Figure 2: Illustration of two network topologies.
Figure 3: Residual against the number of iterations. The results are averaged over $10$ repeated runs.
Figure 4: Residual against the communicated bits. The results are averaged over $5$ repeated runs.
Figure 5: Loss against communicated bits. The results are averaged over $2$ repeated runs.

Theorems & Definitions (46)

Lemma 1
Remark 1
Lemma 2
Remark 2
Lemma 3
proof
Remark 3
Lemma 4
proof
Lemma 5
...and 36 more

CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence

TL;DR

Abstract

CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (46)