Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers
Alexandros E. Tzikas, Licio Romao, Mert Pilanci, Alessandro Abate, Mykel J. Kochenderfer
TL;DR
This work introduces D-ADMMS, a distributed MCMC sampler built by injecting noise into the proximal step of a consensus ADMM framework to sample from a joint posterior $\,\mu^*(x) \propto \exp(-F(x))$ with $F(x)=\sum_i f_i(x)$. The authors prove convergence of the iterates’ distribution to the target in the Wasserstein metric and provide a recursive inequality that connects the graph topology, function conditioning, and noise terms to the convergence rate. They show that, under a condition like $2 m_f \delta > 1$, there exists a contraction factor depending on $\tau_f$ and $\tau_G$, and demonstrate faster convergence than gradient-based distributed samplers in Bayesian linear and logistic regression, especially on sparsely connected networks. The results highlight the potential and limitations of distributed sampling with proximal noise, offering a practical approach when data are privacy-constrained and centralized aggregation is impractical.
Abstract
Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers, which is commonly used in the optimization literature due to its fast convergence. In contrast to distributed optimization, distributed sampling allows for uncertainty quantification in Bayesian inference tasks. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. For our theoretical results, we use convex optimization tools to establish a fundamental inequality on the generated local sample iterates. This inequality enables us to show convergence of the distribution associated with these iterates to the underlying target distribution in Wasserstein distance. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
