Table of Contents
Fetching ...

Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers

Alexandros E. Tzikas, Licio Romao, Mert Pilanci, Alessandro Abate, Mykel J. Kochenderfer

TL;DR

This work introduces D-ADMMS, a distributed MCMC sampler built by injecting noise into the proximal step of a consensus ADMM framework to sample from a joint posterior $\,\mu^*(x) \propto \exp(-F(x))$ with $F(x)=\sum_i f_i(x)$. The authors prove convergence of the iterates’ distribution to the target in the Wasserstein metric and provide a recursive inequality that connects the graph topology, function conditioning, and noise terms to the convergence rate. They show that, under a condition like $2 m_f \delta > 1$, there exists a contraction factor depending on $\tau_f$ and $\tau_G$, and demonstrate faster convergence than gradient-based distributed samplers in Bayesian linear and logistic regression, especially on sparsely connected networks. The results highlight the potential and limitations of distributed sampling with proximal noise, offering a practical approach when data are privacy-constrained and centralized aggregation is impractical.

Abstract

Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers, which is commonly used in the optimization literature due to its fast convergence. In contrast to distributed optimization, distributed sampling allows for uncertainty quantification in Bayesian inference tasks. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. For our theoretical results, we use convex optimization tools to establish a fundamental inequality on the generated local sample iterates. This inequality enables us to show convergence of the distribution associated with these iterates to the underlying target distribution in Wasserstein distance. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.

Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers

TL;DR

This work introduces D-ADMMS, a distributed MCMC sampler built by injecting noise into the proximal step of a consensus ADMM framework to sample from a joint posterior with . The authors prove convergence of the iterates’ distribution to the target in the Wasserstein metric and provide a recursive inequality that connects the graph topology, function conditioning, and noise terms to the convergence rate. They show that, under a condition like , there exists a contraction factor depending on and , and demonstrate faster convergence than gradient-based distributed samplers in Bayesian linear and logistic regression, especially on sparsely connected networks. The results highlight the potential and limitations of distributed sampling with proximal noise, offering a practical approach when data are privacy-constrained and centralized aggregation is impractical.

Abstract

Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers, which is commonly used in the optimization literature due to its fast convergence. In contrast to distributed optimization, distributed sampling allows for uncertainty quantification in Bayesian inference tasks. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. For our theoretical results, we use convex optimization tools to establish a fundamental inequality on the generated local sample iterates. This inequality enables us to show convergence of the distribution associated with these iterates to the underlying target distribution in Wasserstein distance. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
Paper Structure (25 sections, 4 theorems, 96 equations, 6 figures, 1 algorithm)

This paper contains 25 sections, 4 theorems, 96 equations, 6 figures, 1 algorithm.

Key Result

Lemma 1

Define $\beta \in \mathbb{R}^{\lvert \mathcal{A} \rvert d}$. The update equations of D-ADMMS in Algorithm alg:proposed can be derived from the iterates where $X^{(k)}$ is the concatenation of the $x_i^{(k)}$ from Algorithm alg:proposed.

Figures (6)

  • Figure 1: $2$-Wasserstein distance to target distribution vs iteration for $n_i=50$. Both the distance to the target distribution of the average iterate (avg) and a specific agent iterate (ag) are provided for each method. For the sparsely connected (cyclic) graph topology, our proposed algorithm (D-ADMMS) outperforms the baselines (D-SGLD, D-ULA, D-SGHMC) in terms of Wasserstein distance between the distribution of the agent iterate and the target distribution.
  • Figure 2: $2$-Wasserstein distance to target distribution vs iteration for $n_i=200$. Both the distance to the target distribution of the average iterate (avg) and a specific agent iterate (ag) are provided for each method. For the sparsely connected (cyclic) graph topology, our proposed algorithm (D-ADMMS) outperforms the baselines (D-SGLD, D-ULA, D-SGHMC) in terms of Wasserstein distance between the distribution of the agent iterate and the target distribution.
  • Figure 3: $2$-Wasserstein distance of an agent's iterate to the target distribution for varying $\rho$ in D-ADMMS.
  • Figure 4: $2$-Wasserstein distance of an agent's iterate to the target distribution for varying initial sample distribution in D-ADMMS. Standard refers to $x_i{(0)} \sim \mathcal{N}(0, I)$.
  • Figure 5: Evolution of the agents' sample distributions in D-ADMMS for a cyclic network of five agents. Each color corresponds to the samples of a different agent. We also include the true global posterior distribution up to a scaling factor.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Lemma 1
  • Lemma 2
  • Theorem 3
  • Lemma 4