Table of Contents
Fetching ...

Optimality in Decentralized Optimization under Bandwidth Constraints

Alexander Tyurin

Abstract

We consider a realistic decentralized setup with bandwidth-constrained communication and derive optimal time complexities for non-convex stochastic parallel and asynchronous optimization (up to logarithmic factors). We develop the corresponding methods, Grace SGD and Leon SGD, for both homogeneous and heterogeneous settings. Unlike previous work, our optimal bounds are characterized in terms of min-cut/max-flow quantities and rely on tools from Gomory-Hu trees and Steiner Tree Packing problems, providing tighter and more practical complexities.

Optimality in Decentralized Optimization under Bandwidth Constraints

Abstract

We consider a realistic decentralized setup with bandwidth-constrained communication and derive optimal time complexities for non-convex stochastic parallel and asynchronous optimization (up to logarithmic factors). We develop the corresponding methods, Grace SGD and Leon SGD, for both homogeneous and heterogeneous settings. Unlike previous work, our optimal bounds are characterized in terms of min-cut/max-flow quantities and rely on tools from Gomory-Hu trees and Steiner Tree Packing problems, providing tighter and more practical complexities.
Paper Structure (33 sections, 32 theorems, 224 equations, 13 figures, 1 table, 4 algorithms)

This paper contains 33 sections, 32 theorems, 224 equations, 13 figures, 1 table, 4 algorithms.

Key Result

Theorem 1.5

Assume that Assumptions ass:lipschitz_constant, ass:lower_bound, and ass:stochastic_variance_bounded hold. Then Grace SGD (Algorithm alg:main) finds an $\varepsilon$-stationary point of (eq:main_problem) after $K = \left\lceil4 L \Delta/\varepsilon\right\rceil$ iterations, and the time complexity un up to a universal constant factor, where $\{S_{k,p}\}_{k \in [n], p \in [k]}$ and $\{\bar{w}_{k}\}_

Figures (13)

  • Figure 1: Left: communication graph $G$. Node $i$ has computation time $h_i$; edges denote bidirectional links with bandwidth $b_{ij}=b_{ji}$. Right: a Gomory--Hu tree$T$ of the undirected version of $G$ (Definition \ref{['def:gh']}), the central tool for the design of optimal decentralized optimization methods.
  • Figure 2: Step-by-step visualization of Algorithm \ref{['alg:preprocess']} on the Gomory--Hu tree $T.$
  • Figure 3: Step-by-step visualization of the optimal-bandwidth AllReduce for $S=\{1,2,6\}$. Step 0 shows the initial directed graph $G=(V,E,b)$, where each ordered edge $(i,j)$ represents a communication link from worker $i$ to worker $j$ with bandwidth $b_{ij}$. Step 1 shows the corresponding undirected graph $\bar{G}$. Step 2 shows the multigraph $\hat{G}$, where each edge of bandwidth $b_{ij}$ is replaced by $b_{ij}$ parallel unit-bandwidth edges. Step 3 illustrates three edge-disjoint Steiner trees in $\hat{G}$. Step 4 shows the reduce phase: each worker sends one block through each tree to the pivot worker $1$. Step 5 shows the broadcast phase: the pivot sends the aggregated blocks back through the same trees. Important observation: Node $5$ is still required in the optimization process, but only as a switch.
  • Figure 4: Tree of metanodes corresponding to the partition $\{S_{\bar{k}+1,p}\}_{p\in[\bar{k}+1]}$. Each node represents a subset of workers, and every edge corresponds to a communication link whose bandwidth is at most $\bar{w}_{\bar{k}}$.
  • Figure 5: Star graph $G$ in the centralized setting and a Gomory--Hu tree of the corresponding undirected graph $\bar{G}$.
  • ...and 8 more figures

Theorems & Definitions (53)

  • Theorem 1.5: Upper Bound in the Homogeneous Setting
  • Corollary 1.6: Equal computation times
  • Theorem 1.7: Upper Bound in the Heterogeneous Setting
  • Definition 3.1
  • Theorem 3.2: Lower Bound in the Homogeneous Setting
  • Theorem 5.2: Lower Bound in the Heterogeneous Setting
  • Corollary 6.0: Sparse Graphs; Proof in Section \ref{['sec:cor_sparse']}
  • Corollary 6.0: Proof in Section \ref{['sec:cor_sparse_two']}; see also dual Corollary \ref{['cor:cor_sparse_two']}
  • Definition E.1
  • Definition E.2
  • ...and 43 more