Log-Scale Quantization in Distributed First-Order Methods: Gradient-based Learning from Distributed Data
Mohammadreza Doostmohammadian, Muhammad I. Qureshi, Mohammad Hossein Khalesi, Hamid R. Rabiee, Usman A. Khan
TL;DR
The paper tackles gradient-based learning over networks with limited bandwidth by introducing a log-scale quantized, first-order distributed optimization method that uses gradient tracking and operates on weight-balanced, time-varying graphs. It provides a convergence analysis showing that, for sufficiently small step-size $\alpha$, the system converges to the global optimum despite sector-bound quantization via $h_l$ with $q_l(z)$, and establishes an explicit bound $0<\alpha<\overline{\alpha} = \frac{|\operatorname{Re}\{\lambda_2\}|}{L\overline{\mathcal{K}}}$ (generalized for switching topologies). The method does not require stochastic weight design; WB matrices suffice and accommodate topology changes, including link failures. Empirical results on academic and real data (e.g., MNIST logistic regression) show that log-scale quantization reduces the optimality gap relative to uniform quantization, especially on structured networks, highlighting practical benefits for resource-constrained distributed learning.
Abstract
Decentralized strategies are of interest for learning from large-scale data over networks. This paper studies learning over a network of geographically distributed nodes/agents subject to quantization. Each node possesses a private local cost function, collectively contributing to a global cost function, which the considered methodology aims to minimize. In contrast to many existing papers, the information exchange among nodes is log-quantized to address limited network-bandwidth in practical situations. We consider a first-order computationally efficient distributed optimization algorithm (with no extra inner consensus loop) that leverages node-level gradient correction based on local data and network-level gradient aggregation only over nearby nodes. This method only requires balanced networks with no need for stochastic weight design. It can handle log-scale quantized data exchange over possibly time-varying and switching network setups. We study convergence over both structured networks (for example, training over data-centers) and ad-hoc multi-agent networks (for example, training over dynamic robotic networks). Through experimental validation, we show that (i) structured networks generally result in a smaller optimality gap, and (ii) log-scale quantization leads to a smaller optimality gap compared to uniform quantization.
