Decentralized Optimization in Time-Varying Networks with Arbitrary Delays

Tomas Ortega; Hamid Jafarkhani

Decentralized Optimization in Time-Varying Networks with Arbitrary Delays

Tomas Ortega, Hamid Jafarkhani

TL;DR

A novel gossip-based algorithm, called DT-GO, is introduced that achieves the same complexity order as centralized Stochastic Gradient Descent and is applicable in general directed networks, for example networks with delays or limited acknowledgment capabilities.

Abstract

We consider a decentralized optimization problem for networks affected by communication delays. Examples of such networks include collaborative machine learning, sensor networks, and multi-agent systems. To mimic communication delays, we add virtual non-computing nodes to the network, resulting in directed graphs. This motivates investigating decentralized optimization solutions on directed graphs. Existing solutions assume nodes know their out-degrees, resulting in limited applicability. To overcome this limitation, we introduce a novel gossip-based algorithm, called DT-GO, that does not need to know the out-degrees. The algorithm is applicable in general directed networks, for example networks with delays or limited acknowledgment capabilities. We derive convergence rates for both convex and non-convex objectives, showing that our algorithm achieves the same complexity order as centralized Stochastic Gradient Descent. In other words, the effects of the graph topology and delays are confined to higher-order terms. Additionally, we extend our analysis to accommodate time-varying network topologies. Numerical simulations are provided to support our theoretical findings.

Decentralized Optimization in Time-Varying Networks with Arbitrary Delays

TL;DR

Abstract

Paper Structure (23 sections, 10 theorems, 46 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 10 theorems, 46 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Previous work
Contributions
Organization
Notation
Proposed algorithm
Problem setup
Decentralized averaging
Algorithm design
Incorporating Delays
Time-invariant analysis
Case without delays
Case with delays
Time-varying analysis
Convex case
...and 8 more sections

Key Result

Lemma 2.1

Consider a strongly connected directed graph $G$ with an associated gossip matrix $W$ as in def:gossip-matrix. If every node $n$ multiplies its initial state $x_n^{[0]}$ by a factor $d_n \coloneqq \frac{1}{N \pi_n}$, then the gossip iterations converge to the true mean, i.e.,

Figures (5)

Figure 1: A directed communication graph example, with $N=5$.
Figure 2: Plot of corrected and non-corrected node values at different rounds. Initial node values are chosen at random from a normal $\mathcal{N}(0,5)$. The gossip weights are the inverse of the in-degrees, where $G$ is shown in \ref{['fig:example-digraph']}.
Figure 3: Example of a graph with delayed links. We have added a delay of 2 rounds to the edge between Node 4 and Node 2 of the graph in \ref{['fig:example-digraph']}.
Figure 4: Cost and consensus suboptimality plots for various values of $\lambda$ and $p$. Cost is defined as $\frac{1}{N}\sum_{n=1}^N f_n(x_n^{[k]})$, and consensus is $\frac{1}{N}\sum_{n=1}^N (\bar{x}^{[k]} -x_n^{[k]})^2$. The suboptimality is the difference with respect to the baseline, $p=1$, which is centralized SGD. (a) Cost suboptimality plot for varying levels of $p$, without delays. (b) Consensus suboptimality plot for varying levels of $p$, without delays. (c) Cost suboptimality plot for varying levels of delays, with a complete graph. (d) Consensus suboptimality plot for varying levels of delays, with a complete graph. (e) Cost suboptimality for varying levels of $p$, with a fixed delay probability of $\lambda = 0.3$. (f) Consensus suboptimality for varying levels of $p$, with a fixed delay probability of $\lambda = 0.3$.
Figure 5: Cost and consensus suboptimality plots for various values of $p_{err}$. Cost is defined as $\frac{1}{N}\sum_{n=1}^N f_n(x_n^{[k]})$ and consensus is $\frac{1}{N}\sum_{n=1}^N (\bar{x}^{[k]} -x_n^{[k]})^2$. The suboptimality is the difference with respect to the baseline, $p=1$, which is centralized SGD. (a) Cost suboptimality plot for different values of $p_{err}$. (b) Consensus suboptimality plot for different values of $p_{err}$.

Theorems & Definitions (12)

Definition 1: Gossip matrix
Lemma 2.1
Proposition 3.1
Proposition 3.2
Definition 2
Theorem 4.1
Theorem 4.2
Lemma A.1: Descent recursion for the convex case
Lemma A.2: Descent recursion for the non-convex case
Lemma A.3: Consensus distance recursion for the convex case
...and 2 more

Decentralized Optimization in Time-Varying Networks with Arbitrary Delays

TL;DR

Abstract

Decentralized Optimization in Time-Varying Networks with Arbitrary Delays

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (12)