Online Distributed Learning with Quantized Finite-Time Coordination
Nicola Bastianello, Apostolos I. Rikos, Karl H. Johansson
TL;DR
The paper tackles online distributed optimization over directed, fully decentralized networks without a fusion center. It introduces a distributed online projected gradient method that uses a finite-time quantized coordination (FTQC) protocol to approximate the consensus projection with quantized communications and accommodates stochastic gradients. A convergence analysis shows mean convergence bounds that capture the effects of quantization ($ ext{step size }\Delta$), gradient inaccuracy ($\tau$), and time-varying costs ($\sigma$), yielding a limiting error of $ (\\sigma + \\gamma + \\alpha \\tau)/(1 - \\\zeta)$, where $\\zeta$ depends on the objective's strong convexity and smoothness. Numerical results on online logistic regression demonstrate that FTQC-DGD can achieve smaller asymptotic errors than alternatives under quantization, while revealing the trade-offs between batch size, quantization level, and online data shifts. Overall, the work provides a scalable, robust framework for privacy-preserving, bandwidth-efficient online learning in peer-to-peer networks with directed communication.
Abstract
In this paper we consider online distributed learning problems. Online distributed learning refers to the process of training learning models on distributed data sources. In our setting a set of agents need to cooperatively train a learning model from streaming data. Differently from federated learning, the proposed approach does not rely on a central server but only on peer-to-peer communications among the agents. This approach is often used in scenarios where data cannot be moved to a centralized location due to privacy, security, or cost reasons. In order to overcome the absence of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.
