Diffusive Gibbs Sampling

Wenlin Chen; Mingtian Zhang; Brooks Paige; José Miguel Hernández-Lobato; David Barber

Diffusive Gibbs Sampling

Wenlin Chen, Mingtian Zhang, Brooks Paige, José Miguel Hernández-Lobato, David Barber

TL;DR

Diffusive Gibbs Sampling (DiGS) addresses the challenge of sampling from multi-modal unnormalized targets by pairing Gaussian convolution with a Metropolis within Gibbs scheme on the joint space $p(x,\tilde{x})=p(\tilde{x}|x)p(x)$. By alternately sampling the noisy variable $\tilde{x}$ and the denoised variable $x$, and by introducing a MH-based initialization for the denoising step, DiGS achieves robust mode exploration without requiring the intractable convolved score. The paper demonstrates strong empirical gains over standard MCMC baselines (MALA, HMC, parallel tempering) on synthetic MoG problems, Bayesian neural networks, and molecular configuration sampling, including substantial reductions in energy evaluations for MD-like tasks. A multi-level, variance-preserving noise schedule further enhances efficiency, and the method is positioned as a practical, scalable auxiliary-variable MCMC family member with clear avenues for future theoretical and methodological improvements.

Abstract

The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and disconnected modes. DiGS integrates recent developments in diffusion models, leveraging Gaussian convolution to create an auxiliary noisy distribution that bridges isolated modes in the original space and applying Gibbs sampling to alternately draw samples from both spaces. A novel Metropolis-within-Gibbs scheme is proposed to enhance mixing in the denoising sampling step. DiGS exhibits a better mixing property for sampling multi-modal distributions than state-of-the-art methods such as parallel tempering, attaining substantially improved performance across various tasks, including mixtures of Gaussians, Bayesian neural networks and molecular dynamics.

Diffusive Gibbs Sampling

TL;DR

Diffusive Gibbs Sampling (DiGS) addresses the challenge of sampling from multi-modal unnormalized targets by pairing Gaussian convolution with a Metropolis within Gibbs scheme on the joint space

. By alternately sampling the noisy variable

and the denoised variable

, and by introducing a MH-based initialization for the denoising step, DiGS achieves robust mode exploration without requiring the intractable convolved score. The paper demonstrates strong empirical gains over standard MCMC baselines (MALA, HMC, parallel tempering) on synthetic MoG problems, Bayesian neural networks, and molecular configuration sampling, including substantial reductions in energy evaluations for MD-like tasks. A multi-level, variance-preserving noise schedule further enhances efficiency, and the method is positioned as a practical, scalable auxiliary-variable MCMC family member with clear avenues for future theoretical and methodological improvements.

Abstract

Paper Structure (36 sections, 1 theorem, 24 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 36 sections, 1 theorem, 24 equations, 11 figures, 5 tables, 1 algorithm.

Introduction
Score-Based MCMC Methods
Convolution-Based Method
Diffusive Gibbs Sampling
Sampler Construction
Initialization of the Denoising Sampling Step
A Metropolis-within-Gibbs Scheme
Choosing the Gaussian Convolution Kernels
Multi-Level Noise Scheduling
Comparison to Related Methods
Tempering-Based Sampling
Score-Based Diffusion Model
Proximal Sampler
Reverse Diffusion Monte Carlo
Auxiliary Variable MCMC
...and 21 more sections

Key Result

Theorem 2.1

For an absolutely continuous target distribution $p(x)$, DiGS with a Gaussian convolution kernel $p(\tilde{x}|x)=\mathcal{N}(\tilde{x}|\alpha x,\sigma^2I)$ ($\alpha>0, \sigma>0$) yields a $p(x,\tilde{x})$-irreducible and recurrent Markov Chain.

Figures (11)

Figure 1: Challenge of multi-modal sampling with score-based MCMC. The true samples represent a mixture of 9 Gaussians and each Gaussian has a standard deviation $\sigma=0.1$. The generated samples are produced by MALA initialized at the origin.
Figure 2: Visualization of an MoG target with unequal weights $w = [0.1,0.1,0.1,0.7]$ for different components. (a) Density heatmap of the target $p(x)$, a clean sample $x^{(i-1)}$ and a noisy sample $\tilde{x}^{(i-1)}$. (b) Density heatmap of the denoising posterior $p(x|\tilde{x}^{(i-1)})$ with Gaussian convolution parameters $\alpha=1,\sigma=1$.
Figure 3: Comparison of different initialization techniques for denoising posterior sampling on an unequally weighted MoG target described in Figure \ref{['fig:unbalanced:mog:density']}. In each case, we generate 1,000 samples using a Gaussian convolution kernel with $\alpha=1, \sigma=1$.
Figure 4: Effects of the hyperparameters $\alpha,\sigma$ in Gaussian convolution kernels and the number $T$ of noise levels in the variance-preserving (VP) noise scheduling. The y-axis in all three plots is the MMD between true samples and generated samples generated by DiGS with varying hyperparameters. Experimental setups can be found in Appendices \ref{['app:conv-param-comparison']} and \ref{['app:multi-level']}.
Figure 5: Comparison of a multi-modal target distribution $p(x)$, tempered distribution $p_{\beta}(x)$, and convolved distribution $p(\tilde{x})$.
...and 6 more figures

Theorems & Definitions (2)

Theorem 2.1
proof

Diffusive Gibbs Sampling

TL;DR

Abstract

Diffusive Gibbs Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (2)