Generalized EXTRA stochastic gradient Langevin dynamics
Mert Gurbuzbalaban, Mohammad Rafiqul Islam, Xiaoyu Wang, Lingjiong Zhu
TL;DR
This work tackles decentralized Bayesian inference where data are distributed across a network and privacy constraints prevent data sharing. It introduces generalized EXTRA SGLD, an EXTRA-inspired stochastic gradient Langevin dynamic, to remove network-induced bias and accelerate convergence to the posterior π(x) ∝ e^{-f(x)}. The authors establish non-asymptotic 2-Wasserstein guarantees for the averaged iterates and provide a rigorous comparison showing an $\tilde{O}(L^2)$ improvement in iteration complexity over prior DE-SGLD methods, under μ-strong convexity and L-smoothness. They validate the theory with extensive numerical experiments on Bayesian linear and logistic regression tasks, demonstrating faster convergence, improved consensus, and robustness across various network topologies. Overall, the generalized EXTRA SGLD framework offers a principled, scalable approach to distributed posterior sampling with provable guarantees and practical efficacy.
Abstract
Langevin algorithms are popular Markov Chain Monte Carlo methods for Bayesian learning, particularly when the aim is to sample from the posterior distribution of a parametric model, given the input data and the prior distribution over the model parameters. Their stochastic versions such as stochastic gradient Langevin dynamics (SGLD) allow iterative learning based on randomly sampled mini-batches of large datasets and are scalable to large datasets. However, when data is decentralized across a network of agents subject to communication and privacy constraints, standard SGLD algorithms cannot be applied. Instead, we employ decentralized SGLD (DE-SGLD) algorithms, where Bayesian learning is performed collaboratively by a network of agents without sharing individual data. Nonetheless, existing DE-SGLD algorithms induce a bias at every agent that can negatively impact performance; this bias persists even when using full batches and is attributable to network effects. Motivated by the EXTRA algorithm and its generalizations for decentralized optimization, we propose the generalized EXTRA stochastic gradient Langevin dynamics, which eliminates this bias in the full-batch setting. Moreover, we show that, in the mini-batch setting, our algorithm provides performance bounds that significantly improve upon those of standard DE-SGLD algorithms in the literature. Our numerical results also demonstrate the efficiency of the proposed approach.
