Table of Contents
Fetching ...

Generalized EXTRA stochastic gradient Langevin dynamics

Mert Gurbuzbalaban, Mohammad Rafiqul Islam, Xiaoyu Wang, Lingjiong Zhu

TL;DR

This work tackles decentralized Bayesian inference where data are distributed across a network and privacy constraints prevent data sharing. It introduces generalized EXTRA SGLD, an EXTRA-inspired stochastic gradient Langevin dynamic, to remove network-induced bias and accelerate convergence to the posterior π(x) ∝ e^{-f(x)}. The authors establish non-asymptotic 2-Wasserstein guarantees for the averaged iterates and provide a rigorous comparison showing an $\tilde{O}(L^2)$ improvement in iteration complexity over prior DE-SGLD methods, under μ-strong convexity and L-smoothness. They validate the theory with extensive numerical experiments on Bayesian linear and logistic regression tasks, demonstrating faster convergence, improved consensus, and robustness across various network topologies. Overall, the generalized EXTRA SGLD framework offers a principled, scalable approach to distributed posterior sampling with provable guarantees and practical efficacy.

Abstract

Langevin algorithms are popular Markov Chain Monte Carlo methods for Bayesian learning, particularly when the aim is to sample from the posterior distribution of a parametric model, given the input data and the prior distribution over the model parameters. Their stochastic versions such as stochastic gradient Langevin dynamics (SGLD) allow iterative learning based on randomly sampled mini-batches of large datasets and are scalable to large datasets. However, when data is decentralized across a network of agents subject to communication and privacy constraints, standard SGLD algorithms cannot be applied. Instead, we employ decentralized SGLD (DE-SGLD) algorithms, where Bayesian learning is performed collaboratively by a network of agents without sharing individual data. Nonetheless, existing DE-SGLD algorithms induce a bias at every agent that can negatively impact performance; this bias persists even when using full batches and is attributable to network effects. Motivated by the EXTRA algorithm and its generalizations for decentralized optimization, we propose the generalized EXTRA stochastic gradient Langevin dynamics, which eliminates this bias in the full-batch setting. Moreover, we show that, in the mini-batch setting, our algorithm provides performance bounds that significantly improve upon those of standard DE-SGLD algorithms in the literature. Our numerical results also demonstrate the efficiency of the proposed approach.

Generalized EXTRA stochastic gradient Langevin dynamics

TL;DR

This work tackles decentralized Bayesian inference where data are distributed across a network and privacy constraints prevent data sharing. It introduces generalized EXTRA SGLD, an EXTRA-inspired stochastic gradient Langevin dynamic, to remove network-induced bias and accelerate convergence to the posterior π(x) ∝ e^{-f(x)}. The authors establish non-asymptotic 2-Wasserstein guarantees for the averaged iterates and provide a rigorous comparison showing an improvement in iteration complexity over prior DE-SGLD methods, under μ-strong convexity and L-smoothness. They validate the theory with extensive numerical experiments on Bayesian linear and logistic regression tasks, demonstrating faster convergence, improved consensus, and robustness across various network topologies. Overall, the generalized EXTRA SGLD framework offers a principled, scalable approach to distributed posterior sampling with provable guarantees and practical efficacy.

Abstract

Langevin algorithms are popular Markov Chain Monte Carlo methods for Bayesian learning, particularly when the aim is to sample from the posterior distribution of a parametric model, given the input data and the prior distribution over the model parameters. Their stochastic versions such as stochastic gradient Langevin dynamics (SGLD) allow iterative learning based on randomly sampled mini-batches of large datasets and are scalable to large datasets. However, when data is decentralized across a network of agents subject to communication and privacy constraints, standard SGLD algorithms cannot be applied. Instead, we employ decentralized SGLD (DE-SGLD) algorithms, where Bayesian learning is performed collaboratively by a network of agents without sharing individual data. Nonetheless, existing DE-SGLD algorithms induce a bias at every agent that can negatively impact performance; this bias persists even when using full batches and is attributable to network effects. Motivated by the EXTRA algorithm and its generalizations for decentralized optimization, we propose the generalized EXTRA stochastic gradient Langevin dynamics, which eliminates this bias in the full-batch setting. Moreover, we show that, in the mini-batch setting, our algorithm provides performance bounds that significantly improve upon those of standard DE-SGLD algorithms in the literature. Our numerical results also demonstrate the efficiency of the proposed approach.

Paper Structure

This paper contains 38 sections, 16 theorems, 260 equations, 10 figures, 2 tables.

Key Result

Theorem 4

Consider the generalized EXTRA Langevin dynamics with the network averaging matrix $\widetilde{W} = hI_N + \left(1-h\right)W$ where and assume that the stepsize $\eta$ is chosen satisfying where $\gamma_1, \gamma_2, \gamma_{{\scaleto{\widetilde{W}}{5pt}}}, \overline{\gamma}_{{\scaleto{I_{N}-W}{5pt}}}^2$ are constants defined in Table table_constants. Then, for any $K\geq K_{0}$, the following bo

Figures (10)

  • Figure 2: Performance of the EXTRA SGLD for Bayesian linear regression on four different network structures. Out of $20$ agents, we report only the first $4$ agents and the mean of the nodes $\bar{\beta}^{(k)}=\frac{1}{N}\sum_{i=1}^{N}\beta_i^{(k)}$.
  • Figure 3: Comparative performance of the DE-SGLD and EXTRA SGLD for Bayesian linear regression on four different network structures in terms of the $\mathcal{W}_2$ distance of mean agents
  • Figure 4: Histogram of the comparative performances of the DE-SGLD and EXTRA SGLD for Bayesian linear regression on four different network structures.
  • Figure 5: Accuracy distribution of the EXTRA SGLD method across different network structures at a randomly selected node.
  • Figure 6: Comparative accuracy distribution of the DE-SGLD and EXTRA SGLD method across different network structures on Breast Cancer data set. The plots are from a randomly selected node.
  • ...and 5 more figures

Theorems & Definitions (16)

  • Theorem 4
  • Theorem 4
  • Proposition 5
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • Lemma 11
  • Lemma 12
  • ...and 6 more