Table of Contents
Fetching ...

Quantized Decentralized Stochastic Learning over Directed Graphs

Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

TL;DR

This work tackles the communication bottleneck in decentralized learning over directed graphs by introducing quantized push-sum schemes for both gossip and decentralized stochastic optimization. The proposed methods compress exchanged information while preserving the convergence guarantees of exact-communication push-sum, achieving the same $O\left(\frac{1}{\sqrt{nT}}\right)$ rates for convex problems and the corresponding stationary-point rates for non-convex problems. The analysis shows quantization noise decays at a linear spectral rate, enabling vanishing error and matching performance with significantly reduced communication. Numerical experiments on directed graphs demonstrate substantial reductions in transmitted bits (up to 5–10x) with negligible loss in convergence speed or final accuracy, highlighting practical benefits for large-scale distributed learning.

Abstract

We consider a decentralized stochastic learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the heavy communication load due to each node transmitting large messages (model updates) to its neighbors. To tackle this bottleneck, we propose the quantized decentralized stochastic learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization. More importantly, we prove that our algorithm achieves the same convergence rates of the decentralized stochastic learning algorithm with exact-communication for both convex and non-convex losses. Numerical evaluations corroborate our main theoretical results and illustrate significant speed-up compared to the exact-communication methods.

Quantized Decentralized Stochastic Learning over Directed Graphs

TL;DR

This work tackles the communication bottleneck in decentralized learning over directed graphs by introducing quantized push-sum schemes for both gossip and decentralized stochastic optimization. The proposed methods compress exchanged information while preserving the convergence guarantees of exact-communication push-sum, achieving the same rates for convex problems and the corresponding stationary-point rates for non-convex problems. The analysis shows quantization noise decays at a linear spectral rate, enabling vanishing error and matching performance with significantly reduced communication. Numerical experiments on directed graphs demonstrate substantial reductions in transmitted bits (up to 5–10x) with negligible loss in convergence speed or final accuracy, highlighting practical benefits for large-scale distributed learning.

Abstract

We consider a decentralized stochastic learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the heavy communication load due to each node transmitting large messages (model updates) to its neighbors. To tackle this bottleneck, we propose the quantized decentralized stochastic learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization. More importantly, we prove that our algorithm achieves the same convergence rates of the decentralized stochastic learning algorithm with exact-communication for both convex and non-convex losses. Numerical evaluations corroborate our main theoretical results and illustrate significant speed-up compared to the exact-communication methods.

Paper Structure

This paper contains 16 sections, 8 theorems, 95 equations, 5 figures, 1 table, 2 algorithms.

Key Result

Proposition 2.1

Let Assumptions assumption:graph and assumption:matrix hold and let $A$ be the corresponding weight matrix of workers in a graph $\mathcal{G}$. Then, there exist a stochastic vector $\bm\phi\in\mathbb{R}^n$, and constants $0<{\lambda}<1$ and $C>0$ such that for all $t\ge0$: Moreover there exists a constant $\delta>0$ such that for all $i \in [n]$ and $t\ge1$

Figures (5)

  • Figure 1: The experimented directed graphs representing communication between computing nodes
  • Figure 2: Comparison of the proposed algorithm for the gossip problem and the push-sum protocol using exact-communication based on iteration (Top) and total number of bits communicated between two neighbor nodes (Bottom) over the graphs $\mathcal{G}_1$ and $\mathcal{G}_2$.
  • Figure 3: Comparison of the proposed method and exact-communication push-sum method using least-square as objective, based on the iteration number (Top) and total number of communicated bits over the graphs $\mathcal{G}_1$ and $\mathcal{G}_2$.
  • Figure 4: Comparison of the proposed method and exact-communication push-sum method in training a neural network with MNIST data-set, based on the iteration number (Top) and total number of communicated bits (Bottom).
  • Figure 5: Comparison of the proposed method and exact-communication push-sum method in training a neural network with CIFAR-10 data-set, based on the iteration number (Left) and total number of bits communicated between two neighbor nodes (Right).

Theorems & Definitions (18)

  • Proposition 2.1
  • Example 1
  • Example 2: Low-precision Quantizer
  • Theorem 5.1: Gossip
  • Theorem 5.2: Convex Objectives
  • Remark 1
  • Theorem 5.3: Non-convex Objectives
  • Remark 2
  • Remark 3
  • Remark 4
  • ...and 8 more