Table of Contents
Fetching ...

DIGing--SGLD: Decentralized and Scalable Langevin Sampling over Time--Varying Networks

Waheed U. Bajwa, Mert Gurbuzbalaban, Mustafa Ali Kutbay, Lingjiong Zhu, Muhammad Zulqarnain

TL;DR

DIGing-SGLD addresses decentralized Bayesian posterior sampling over time-varying networks by integrating stochastic gradient Langevin dynamics with gradient-tracking inspired by DIGing. Under standard strong convexity and smoothness assumptions, it delivers finite-time, non-asymptotic $\mathcal{W}_2$ guarantees to an $O(\sqrt{\eta})$ neighborhood of the Gibbs distribution, with an explicit iteration complexity of $K = O\big(\log(1/\epsilon)/\epsilon^{2}\big)$ when the stepsize is set $\eta = O(\epsilon^{2})$. The method operates without a central coordinator and accommodates time-varying connectivity, achieving convergence rates matching those of centralized and static-graph SGLD despite network drift and gradient noise. Numerical experiments on Bayesian linear and logistic regression validate the theory, showing robust performance and reduced sampling bias under dynamic network conditions. Overall, the paper provides the first non-asymptotic, explicit-constant guarantees for decentralized SGLD on time-varying graphs and demonstrates practical viability for scalable, coordinator-free Bayesian inference in evolving networks.

Abstract

Sampling from a target distribution induced by training data is central to Bayesian learning, with Stochastic Gradient Langevin Dynamics (SGLD) serving as a key tool for scalable posterior sampling and decentralized variants enabling learning when data are distributed across a network of agents. This paper introduces DIGing-SGLD, a decentralized SGLD algorithm designed for scalable Bayesian learning in multi-agent systems operating over time-varying networks. Existing decentralized SGLD methods are restricted to static network topologies, and many exhibit steady-state sampling bias caused by network effects, even when full batches are used. DIGing-SGLD overcomes these limitations by integrating Langevin-based sampling with the gradient-tracking mechanism of the DIGing algorithm, originally developed for decentralized optimization over time-varying networks, thereby enabling efficient and bias-free sampling without a central coordinator. To our knowledge, we provide the first finite-time non-asymptotic Wasserstein convergence guarantees for decentralized SGLD-based sampling over time-varying networks, with explicit constants. Under standard strong convexity and smoothness assumptions, DIGing-SGLD achieves geometric convergence to an $O(\sqrtη)$ neighborhood of the target distribution, where $η$ is the stepsize, with dependence on the target accuracy matching the best-known rates for centralized and static-network SGLD algorithms using constant stepsize. Numerical experiments on Bayesian linear and logistic regression validate the theoretical results and demonstrate the strong empirical performance of DIGing-SGLD under dynamically evolving network conditions.

DIGing--SGLD: Decentralized and Scalable Langevin Sampling over Time--Varying Networks

TL;DR

DIGing-SGLD addresses decentralized Bayesian posterior sampling over time-varying networks by integrating stochastic gradient Langevin dynamics with gradient-tracking inspired by DIGing. Under standard strong convexity and smoothness assumptions, it delivers finite-time, non-asymptotic guarantees to an neighborhood of the Gibbs distribution, with an explicit iteration complexity of when the stepsize is set . The method operates without a central coordinator and accommodates time-varying connectivity, achieving convergence rates matching those of centralized and static-graph SGLD despite network drift and gradient noise. Numerical experiments on Bayesian linear and logistic regression validate the theory, showing robust performance and reduced sampling bias under dynamic network conditions. Overall, the paper provides the first non-asymptotic, explicit-constant guarantees for decentralized SGLD on time-varying graphs and demonstrates practical viability for scalable, coordinator-free Bayesian inference in evolving networks.

Abstract

Sampling from a target distribution induced by training data is central to Bayesian learning, with Stochastic Gradient Langevin Dynamics (SGLD) serving as a key tool for scalable posterior sampling and decentralized variants enabling learning when data are distributed across a network of agents. This paper introduces DIGing-SGLD, a decentralized SGLD algorithm designed for scalable Bayesian learning in multi-agent systems operating over time-varying networks. Existing decentralized SGLD methods are restricted to static network topologies, and many exhibit steady-state sampling bias caused by network effects, even when full batches are used. DIGing-SGLD overcomes these limitations by integrating Langevin-based sampling with the gradient-tracking mechanism of the DIGing algorithm, originally developed for decentralized optimization over time-varying networks, thereby enabling efficient and bias-free sampling without a central coordinator. To our knowledge, we provide the first finite-time non-asymptotic Wasserstein convergence guarantees for decentralized SGLD-based sampling over time-varying networks, with explicit constants. Under standard strong convexity and smoothness assumptions, DIGing-SGLD achieves geometric convergence to an neighborhood of the target distribution, where is the stepsize, with dependence on the target accuracy matching the best-known rates for centralized and static-network SGLD algorithms using constant stepsize. Numerical experiments on Bayesian linear and logistic regression validate the theoretical results and demonstrate the strong empirical performance of DIGing-SGLD under dynamically evolving network conditions.

Paper Structure

This paper contains 33 sections, 19 theorems, 155 equations, 3 figures.

Key Result

Theorem 3.4

Consider the DIGing-SGLD algorithm with constant stepsize $\eta>0$. Assume that $\left\Vert x^{(0)}\right\Vert_{L_{2}}$ is finite. Let $\alpha,\beta>0$ be fixed scalars, and let $\lambda \in (\delta^{1/B},1)$, where $\delta\in(0,1)$ is as given in Assumption assump:W. The stepsize $\eta>0$ is chosen where $\gamma_{1},\gamma_{2},\gamma_{3},\gamma_{4}$ are defined in defn:gamma:1:2:main--defn:gamma:

Figures (3)

  • Figure 1: Illustrations of the two undirected time-varying network structures used in our experiments.
  • Figure 2: Comparison of DIGing-SGLD and DE-SGLD for Bayesian linear regression on synthetic data under time-varying barbell and lollipop network structures. Each plot displays the average Wasserstein distance across agents, with one standard deviation shown as a shaded region.
  • Figure 3: Performance comparison of DIGing-SGLD and DE-SGLD for Bayesian logistic regression under time-varying barbell and lollipop network structures. Each plot displays the average classification accuracy across agents and independent trials, with one standard deviation across agents shown as a shaded region.

Theorems & Definitions (48)

  • Theorem 3.4
  • Theorem 3.5
  • Remark 1: Interpretations of $E_{1},E_{2}$ and $E_{3}$
  • Remark 2: Feasibility of parameter choices
  • Lemma 3.6
  • proof
  • Corollary 3.7
  • proof
  • Remark 3
  • Lemma 3.8
  • ...and 38 more