Log-Scale Quantization in Distributed First-Order Methods: Gradient-based Learning from Distributed Data

Mohammadreza Doostmohammadian; Muhammad I. Qureshi; Mohammad Hossein Khalesi; Hamid R. Rabiee; Usman A. Khan

Log-Scale Quantization in Distributed First-Order Methods: Gradient-based Learning from Distributed Data

Mohammadreza Doostmohammadian, Muhammad I. Qureshi, Mohammad Hossein Khalesi, Hamid R. Rabiee, Usman A. Khan

TL;DR

The paper tackles gradient-based learning over networks with limited bandwidth by introducing a log-scale quantized, first-order distributed optimization method that uses gradient tracking and operates on weight-balanced, time-varying graphs. It provides a convergence analysis showing that, for sufficiently small step-size $\alpha$, the system converges to the global optimum despite sector-bound quantization via $h_l$ with $q_l(z)$, and establishes an explicit bound $0<\alpha<\overline{\alpha} = \frac{|\operatorname{Re}\{\lambda_2\}|}{L\overline{\mathcal{K}}}$ (generalized for switching topologies). The method does not require stochastic weight design; WB matrices suffice and accommodate topology changes, including link failures. Empirical results on academic and real data (e.g., MNIST logistic regression) show that log-scale quantization reduces the optimality gap relative to uniform quantization, especially on structured networks, highlighting practical benefits for resource-constrained distributed learning.

Abstract

Decentralized strategies are of interest for learning from large-scale data over networks. This paper studies learning over a network of geographically distributed nodes/agents subject to quantization. Each node possesses a private local cost function, collectively contributing to a global cost function, which the considered methodology aims to minimize. In contrast to many existing papers, the information exchange among nodes is log-quantized to address limited network-bandwidth in practical situations. We consider a first-order computationally efficient distributed optimization algorithm (with no extra inner consensus loop) that leverages node-level gradient correction based on local data and network-level gradient aggregation only over nearby nodes. This method only requires balanced networks with no need for stochastic weight design. It can handle log-scale quantized data exchange over possibly time-varying and switching network setups. We study convergence over both structured networks (for example, training over data-centers) and ad-hoc multi-agent networks (for example, training over dynamic robotic networks). Through experimental validation, we show that (i) structured networks generally result in a smaller optimality gap, and (ii) log-scale quantization leads to a smaller optimality gap compared to uniform quantization.

Log-Scale Quantization in Distributed First-Order Methods: Gradient-based Learning from Distributed Data

TL;DR

, the system converges to the global optimum despite sector-bound quantization via

with

, and establishes an explicit bound

(generalized for switching topologies). The method does not require stochastic weight design; WB matrices suffice and accommodate topology changes, including link failures. Empirical results on academic and real data (e.g., MNIST logistic regression) show that log-scale quantization reduces the optimality gap relative to uniform quantization, especially on structured networks, highlighting practical benefits for resource-constrained distributed learning.

Abstract

Paper Structure (13 sections, 4 theorems, 42 equations, 10 figures, 1 algorithm)

This paper contains 13 sections, 4 theorems, 42 equations, 10 figures, 1 algorithm.

Introduction
Assumptions, Terminology, and Problem Statement
Effect of Network Structure and Quantization
Distributed Optimization over Networks: Algorithm and Main Results
Algorithm
Proof of Convergence
Discussions
Numerical Experiments
Academic Example
Real Data-Set Example
Conclusions
Concluding Remarks
Future Directions

Key Result

Lemma 1

cai2012average Consider the square matrix $P(\alpha)$ of size $n$ which depends on parameter ${\alpha \in \mathbb{R}_{\geq0} }$. Let $P(0)$ has ${N<n}$ equal eigenvalues $\lambda_1=\ldots=\lambda_N$, associated with right and left unit eigenvectors $\mathbf{v}_1,\ldots,\mathbf{v}_N$ and $\mathbf{u}_

Figures (10)

Figure 1: Uniform quantization (as a non-sector-bound function) versus logarithmic quantization (as a sector-bound function). The logarithmic case leads to finer quantization around zero in contrast to the uniformly quantized case.
Figure 2: This figure gives an example graph topology which is both stochastic and WB. The red link represents an unreliable channel that might be subject to failure or packet drop. By removing this link, the network preserves the WB condition but loses stochasticity. Therefore, many weight-stochastic algorithms in the literature need to redesign the weights for convergence.
Figure 3: Logarithmically quantized distributed optimization over an exponential network versus an ER random network of the same size.
Figure 4: Optimality gap of uniformly quantized versus logarithmically quantized distributed learning over exponential network.
Figure 5: The comparison of the proposed log-quantized algorithm with the existing non-quantized algorithms for a strongly-convex cost model.
...and 5 more figures

Theorems & Definitions (11)

Lemma 1
Lemma 2
Theorem 1
proof
Theorem 2
proof
Remark 1
Remark 2
Remark 3
Remark 4
...and 1 more

Log-Scale Quantization in Distributed First-Order Methods: Gradient-based Learning from Distributed Data

TL;DR

Abstract

Log-Scale Quantization in Distributed First-Order Methods: Gradient-based Learning from Distributed Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (11)