Asynchronous Local Computations in Distributed Bayesian Learning

Kinjal Bhar; He Bai; Jemin George; Carl Busart

Asynchronous Local Computations in Distributed Bayesian Learning

Kinjal Bhar, He Bai, Jemin George, Carl Busart

TL;DR

This work develops an asynchronous gossip-based distributed Bayesian learning framework that uses multiple local Unadjusted Langevin Algorithm (ULA) updates between inter-agent communications to reduce communication overhead. By modeling gossip with a Poisson process and reusing mini-batch gradients within cycles, the method achieves consensus and converges to the Bayesian posterior p^* under Lipschitz and log-Sobolev assumptions, with rates that scale polynomially with the cycle parameter δ_α. Theoretical results establish consensus and KL-divergence convergence, while experiments on a toy Gaussian problem and real datasets (Gamma Telescope and mHealth) demonstrate faster initial convergence and robust classification performance, particularly in low-data regimes. The approach is applicable to both decentralized and federated-like settings, offering practical gains in speed and uncertainty quantification. Key contributions include a formal analysis of local computations per cycle, explicit step-size and local-iteration conditions, and validation of asynchronous gossip-ULA in realistic classification tasks.

Abstract

Due to the expanding scope of machine learning (ML) to the fields of sensor networking, cooperative robotics and many other multi-agent systems, distributed deployment of inference algorithms has received a lot of attention. These algorithms involve collaboratively learning unknown parameters from dispersed data collected by multiple agents. There are two competing aspects in such algorithms, namely, intra-agent computation and inter-agent communication. Traditionally, algorithms are designed to perform both synchronously. However, certain circumstances need frugal use of communication channels as they are either unreliable, time-consuming, or resource-expensive. In this paper, we propose gossip-based asynchronous communication to leverage fast computations and reduce communication overhead simultaneously. We analyze the effects of multiple (local) intra-agent computations by the active agents between successive inter-agent communications. For local computations, Bayesian sampling via unadjusted Langevin algorithm (ULA) MCMC is utilized. The communication is assumed to be over a connected graph (e.g., as in decentralized learning), however, the results can be extended to coordinated communication where there is a central server (e.g., federated learning). We theoretically quantify the convergence rates in the process. To demonstrate the efficacy of the proposed algorithm, we present simulations on a toy problem as well as on real world data sets to train ML models to perform classification tasks. We observe faster initial convergence and improved performance accuracy, especially in the low data range. We achieve on average 78% and over 90% classification accuracy respectively on the Gamma Telescope and mHealth data sets from the UCI ML repository.

Asynchronous Local Computations in Distributed Bayesian Learning

TL;DR

Abstract

Paper Structure (22 sections, 5 theorems, 86 equations, 5 figures, 1 algorithm)

This paper contains 22 sections, 5 theorems, 86 equations, 5 figures, 1 algorithm.

Introduction
Preliminaries
Bayesian inference
Unadjusted Langevin Algorithm (ULA)
Gossip-based Protocol
Stochastic Gradient
Methodology
Theoretical results
Consensus
Convergence
Discussion on Results
Experiments
1D Gaussian toy Problem
Classification
Binary Classification
...and 7 more sections

Key Result

Theorem 1

Suppose that Assumptions assump:Lipz--assump:sto_grad hold and Conditions cond:1-cond:3 are satisfied, then the consensus error $\tilde{\mathbf{w}}(k+1)$ satisfies where

Figures (5)

Figure 1: Comparison of the KL divergence using gossip ULA for different number of local computations per cycle; averaged over all agents and all chains.
Figure 2: Comparison of the accuracy on the test data set averaged over $10$ trials for each agent. (Here, 'can' and 'gos' denote canonical and gossip respectively.)
Figure 3: Comparison of the accuracy on the test data set for $6$ agents for $T=1$, $T=3$ and $T=5$ local computations via gossip ULA.
Figure 4: Comparison of the accuracy on the test data set for $6$ agents for $T=1$ and varying $T$ (denoted by 'var') local computations via gossip ULA.
Figure 5: Comparison of the accuracy on the test data set for $6$ agents for $T=1$ and $T=8$ local computations via gossip ULA for a larger training data set.

Theorems & Definitions (5)

Theorem 1
Theorem 2
Lemma 1
Lemma 2
Lemma 3

Asynchronous Local Computations in Distributed Bayesian Learning

TL;DR

Abstract

Asynchronous Local Computations in Distributed Bayesian Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (5)