Decentralized Optimization in Networks with Arbitrary Delays

Tomas Ortega; Hamid Jafarkhani

Decentralized Optimization in Networks with Arbitrary Delays

Tomas Ortega, Hamid Jafarkhani

TL;DR

This paper addresses decentralized optimization over directed graphs with arbitrary communication delays, proposing DT-GO, a gossip-based method that does not require nodes to know their out-degree. By combining local updates with a corrected consensus step and modeling delays with virtual nodes, the authors prove convergence for non-convex objectives and establish a convergence rate that matches centralized SGD up to higher-order terms dependent on topology and delays. The approach unifies delay handling with consensus in directed graphs and demonstrates empirically that increased connectivity accelerates convergence while delays degrade performance. The work broadens the applicability of decentralized optimization to asynchronous, directed networks where degree information is unavailable, with practical implications for distributed learning and sensor networks.

Abstract

We consider the problem of decentralized optimization in networks with communication delays. To accommodate delays, we need decentralized optimization algorithms that work on directed graphs. Existing approaches require nodes to know their out-degree to achieve convergence. We propose a novel gossip-based algorithm that circumvents this requirement, allowing decentralized optimization in networks with communication delays. We prove that our algorithm converges on non-convex objectives, with the same main complexity order term as centralized Stochastic Gradient Descent (SGD), and show that the graph topology and the delays only affect the higher order terms. We provide numerical simulations that illustrate our theoretical results.

Decentralized Optimization in Networks with Arbitrary Delays

TL;DR

Abstract

Paper Structure (12 sections, 5 theorems, 17 equations, 7 figures, 1 algorithm)

This paper contains 12 sections, 5 theorems, 17 equations, 7 figures, 1 algorithm.

Introduction
Setup and proposed algorithm
Problem setup
Decentralized averaging
Algorithm design
Incorporating Delays
Performance bounds and proof of convergence
Case without delays
Case with delays
DT-GO's convergence rate
Experimental results
Conclusions

Key Result

Lemma 2.1

Consider a digraph $G$ with an associated gossip matrix $W$ as in def:gossip-matrix. If every node $n$ multiplies its initial state $x_n(0)$ by a factor $d_n = \frac{1}{N \pi_n}$, then the gossip iterations $x_n(t+1) = \sum_{m=1}^N W_{nm} x_m(t)$ converge to the true mean, i.e.,

Figures (7)

Figure 1: Directed communication graph example, with $N=5$.
Figure 2: Plot of corrected and non-corrected node values throughout time. Initial node values are chosen at random from a normal $\mathcal{N}(0,5)$. The gossip weights are the inverse of the in-degrees, where $G$ is shown in \ref{['fig:example-digraph']}.
Figure 3: Example of a graph with delayed links. We have added a delay of 2 rounds to the edge between Node 4 and Node 2 of the graph in \ref{['fig:example-digraph']}.
Figure 4: DT-GO's cost comparison for strongly connected directed random graphs with edge probability $p$. The cost is $\frac{1}{N}\sum_{n=1}^N (n-x_n(t))^2$, and the suboptimality is the difference with respect to the baseline, $p=1$, which is centralized SGD.
Figure 5: DT-GO's consensus comparison for strongly connected directed random graphs with edge probability $p$. Consensus is defined as $\frac{1}{N}\sum_{n=1}^N (\bar{x}(t) -x_n(t))^2$, and the suboptimality is the difference with respect to the baseline, $p=1$, which is centralized SGD.
...and 2 more figures

Theorems & Definitions (6)

Definition 1: Gossip matrix
Lemma 2.1
Proposition 3.1
Proposition 3.2
Theorem 3.4
Lemma 3.5

Decentralized Optimization in Networks with Arbitrary Delays

TL;DR

Abstract

Decentralized Optimization in Networks with Arbitrary Delays

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (6)