Table of Contents
Fetching ...

Diffusive Gibbs Sampling

Wenlin Chen, Mingtian Zhang, Brooks Paige, José Miguel Hernández-Lobato, David Barber

TL;DR

Diffusive Gibbs Sampling (DiGS) addresses the challenge of sampling from multi-modal unnormalized targets by pairing Gaussian convolution with a Metropolis within Gibbs scheme on the joint space $p(x,\tilde{x})=p(\tilde{x}|x)p(x)$. By alternately sampling the noisy variable $\tilde{x}$ and the denoised variable $x$, and by introducing a MH-based initialization for the denoising step, DiGS achieves robust mode exploration without requiring the intractable convolved score. The paper demonstrates strong empirical gains over standard MCMC baselines (MALA, HMC, parallel tempering) on synthetic MoG problems, Bayesian neural networks, and molecular configuration sampling, including substantial reductions in energy evaluations for MD-like tasks. A multi-level, variance-preserving noise schedule further enhances efficiency, and the method is positioned as a practical, scalable auxiliary-variable MCMC family member with clear avenues for future theoretical and methodological improvements.

Abstract

The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and disconnected modes. DiGS integrates recent developments in diffusion models, leveraging Gaussian convolution to create an auxiliary noisy distribution that bridges isolated modes in the original space and applying Gibbs sampling to alternately draw samples from both spaces. A novel Metropolis-within-Gibbs scheme is proposed to enhance mixing in the denoising sampling step. DiGS exhibits a better mixing property for sampling multi-modal distributions than state-of-the-art methods such as parallel tempering, attaining substantially improved performance across various tasks, including mixtures of Gaussians, Bayesian neural networks and molecular dynamics.

Diffusive Gibbs Sampling

TL;DR

Diffusive Gibbs Sampling (DiGS) addresses the challenge of sampling from multi-modal unnormalized targets by pairing Gaussian convolution with a Metropolis within Gibbs scheme on the joint space . By alternately sampling the noisy variable and the denoised variable , and by introducing a MH-based initialization for the denoising step, DiGS achieves robust mode exploration without requiring the intractable convolved score. The paper demonstrates strong empirical gains over standard MCMC baselines (MALA, HMC, parallel tempering) on synthetic MoG problems, Bayesian neural networks, and molecular configuration sampling, including substantial reductions in energy evaluations for MD-like tasks. A multi-level, variance-preserving noise schedule further enhances efficiency, and the method is positioned as a practical, scalable auxiliary-variable MCMC family member with clear avenues for future theoretical and methodological improvements.

Abstract

The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and disconnected modes. DiGS integrates recent developments in diffusion models, leveraging Gaussian convolution to create an auxiliary noisy distribution that bridges isolated modes in the original space and applying Gibbs sampling to alternately draw samples from both spaces. A novel Metropolis-within-Gibbs scheme is proposed to enhance mixing in the denoising sampling step. DiGS exhibits a better mixing property for sampling multi-modal distributions than state-of-the-art methods such as parallel tempering, attaining substantially improved performance across various tasks, including mixtures of Gaussians, Bayesian neural networks and molecular dynamics.
Paper Structure (36 sections, 1 theorem, 24 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 36 sections, 1 theorem, 24 equations, 11 figures, 5 tables, 1 algorithm.

Key Result

Theorem 2.1

For an absolutely continuous target distribution $p(x)$, DiGS with a Gaussian convolution kernel $p(\tilde{x}|x)=\mathcal{N}(\tilde{x}|\alpha x,\sigma^2I)$ ($\alpha>0, \sigma>0$) yields a $p(x,\tilde{x})$-irreducible and recurrent Markov Chain.

Figures (11)

  • Figure 1: Challenge of multi-modal sampling with score-based MCMC. The true samples represent a mixture of 9 Gaussians and each Gaussian has a standard deviation $\sigma=0.1$. The generated samples are produced by MALA initialized at the origin.
  • Figure 2: Visualization of an MoG target with unequal weights $w = [0.1,0.1,0.1,0.7]$ for different components. (a) Density heatmap of the target $p(x)$, a clean sample $x^{(i-1)}$ and a noisy sample $\tilde{x}^{(i-1)}$. (b) Density heatmap of the denoising posterior $p(x|\tilde{x}^{(i-1)})$ with Gaussian convolution parameters $\alpha=1,\sigma=1$.
  • Figure 3: Comparison of different initialization techniques for denoising posterior sampling on an unequally weighted MoG target described in Figure \ref{['fig:unbalanced:mog:density']}. In each case, we generate 1,000 samples using a Gaussian convolution kernel with $\alpha=1, \sigma=1$.
  • Figure 4: Effects of the hyperparameters $\alpha,\sigma$ in Gaussian convolution kernels and the number $T$ of noise levels in the variance-preserving (VP) noise scheduling. The y-axis in all three plots is the MMD between true samples and generated samples generated by DiGS with varying hyperparameters. Experimental setups can be found in Appendices \ref{['app:conv-param-comparison']} and \ref{['app:multi-level']}.
  • Figure 5: Comparison of a multi-modal target distribution $p(x)$, tempered distribution $p_{\beta}(x)$, and convolved distribution $p(\tilde{x})$.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Theorem 2.1
  • proof