Diffusion Generative Modelling for Divide-and-Conquer MCMC
C. Trojan, P. Fearnhead, C. Nemeth
TL;DR
This work tackles the challenge of merging subposteriors in divide-and-conquer MCMC without relying on Gaussianity. It introduces diffusion-based density approximations to learn unnormalised subposterior densities and combines them via annealed MCMC to sample the full posterior $p^{\text{full}}(\theta)$, with densities that interpolate between a Gaussian prior at $t=1$ and a learned non-Gaussian target at $t=0$. The approach uses score-matching objectives, energy-based parameterisations, and reparameterised SDEs to align subposterior priors and enable efficient sampling from complex, high-dimensional posteriors. Empirical results on toy and real datasets show improved accuracy over traditional merging methods, particularly in cases with non-Gaussian, multimodal, or poorly overlapping subposteriors, signaling a practical method for scalable Bayesian inference on large datasets.
Abstract
Divide-and-conquer MCMC is a strategy for parallelising Markov Chain Monte Carlo sampling by running independent samplers on disjoint subsets of a dataset and merging their output. An ongoing challenge in the literature is to efficiently perform this merging without imposing distributional assumptions on the posteriors. We propose using diffusion generative modelling to fit density approximations to the subposterior distributions. This approach outperforms existing methods on challenging merging problems, while its computational cost scales more efficiently to high dimensional problems than existing density estimation approaches.
