Table of Contents
Fetching ...

Diffusion Generative Modelling for Divide-and-Conquer MCMC

C. Trojan, P. Fearnhead, C. Nemeth

TL;DR

This work tackles the challenge of merging subposteriors in divide-and-conquer MCMC without relying on Gaussianity. It introduces diffusion-based density approximations to learn unnormalised subposterior densities and combines them via annealed MCMC to sample the full posterior $p^{\text{full}}(\theta)$, with densities that interpolate between a Gaussian prior at $t=1$ and a learned non-Gaussian target at $t=0$. The approach uses score-matching objectives, energy-based parameterisations, and reparameterised SDEs to align subposterior priors and enable efficient sampling from complex, high-dimensional posteriors. Empirical results on toy and real datasets show improved accuracy over traditional merging methods, particularly in cases with non-Gaussian, multimodal, or poorly overlapping subposteriors, signaling a practical method for scalable Bayesian inference on large datasets.

Abstract

Divide-and-conquer MCMC is a strategy for parallelising Markov Chain Monte Carlo sampling by running independent samplers on disjoint subsets of a dataset and merging their output. An ongoing challenge in the literature is to efficiently perform this merging without imposing distributional assumptions on the posteriors. We propose using diffusion generative modelling to fit density approximations to the subposterior distributions. This approach outperforms existing methods on challenging merging problems, while its computational cost scales more efficiently to high dimensional problems than existing density estimation approaches.

Diffusion Generative Modelling for Divide-and-Conquer MCMC

TL;DR

This work tackles the challenge of merging subposteriors in divide-and-conquer MCMC without relying on Gaussianity. It introduces diffusion-based density approximations to learn unnormalised subposterior densities and combines them via annealed MCMC to sample the full posterior , with densities that interpolate between a Gaussian prior at and a learned non-Gaussian target at . The approach uses score-matching objectives, energy-based parameterisations, and reparameterised SDEs to align subposterior priors and enable efficient sampling from complex, high-dimensional posteriors. Empirical results on toy and real datasets show improved accuracy over traditional merging methods, particularly in cases with non-Gaussian, multimodal, or poorly overlapping subposteriors, signaling a practical method for scalable Bayesian inference on large datasets.

Abstract

Divide-and-conquer MCMC is a strategy for parallelising Markov Chain Monte Carlo sampling by running independent samplers on disjoint subsets of a dataset and merging their output. An ongoing challenge in the literature is to efficiently perform this merging without imposing distributional assumptions on the posteriors. We propose using diffusion generative modelling to fit density approximations to the subposterior distributions. This approach outperforms existing methods on challenging merging problems, while its computational cost scales more efficiently to high dimensional problems than existing density estimation approaches.
Paper Structure (46 sections, 19 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 46 sections, 19 equations, 5 figures, 5 tables, 2 algorithms.

Figures (5)

  • Figure 1: Annealed diffusion sampling in the mixture of Gaussians example. Full posterior in black.
  • Figure 2: Merged posterior contour plots for the toy logistic regression example.
  • Figure 3: Mixture of Gaussians posterior contour plots for $\theta_1$ and $\theta_2$.
  • Figure 4: Merged posterior contour plots for first two parameters in the mixture of Gaussians example.
  • Figure 5: Subposterior contours for the logistic regression with full posterior in blue.