Table of Contents
Fetching ...

Divide, Interact, Sample: The Two-System Paradigm

James Chok, Myung Won Lee, Daniel Paulin, Geoffrey M. Vasil

TL;DR

The paper addresses the challenge of efficiently sampling high-dimensional distributions by unifying ensemble-chain, mean-field, and adaptive MCMC within a single two-system framework that pairs two interacting subsystems to propose updates for one another while preserving the target distribution $ ho$. By deriving two-system versions of overdamped and underdamped Langevin samplers (MALA and MAKLA) and providing both continuous- and discrete-time realizations, the authors enable parallel, MH-corrected updates with reduced computational costs relative to traditional ensemble methods. Extensive experiments on synthetic targets and posteriordb benchmarks show that adaptive two-system MAKLA variants achieve order-of-magnitude improvements in effective sample size per gradient evaluation compared to NUTS, and maintain robust performance across dimensions, including high-dimensional problems. The framework also clarifies the connections between ensemble, mean-field, and adaptive approaches, offering practical algorithms with strong theoretical guarantees and a scalable path to high-throughput Bayesian computation. The authors release open-source implementations to facilitate adoption and replication of their results.

Abstract

Mean-field, ensemble-chain, and adaptive samplers have historically been viewed as distinct approaches to Monte Carlo sampling. In this paper, we present a unifying {two-system} framework that brings all three under one roof. In our approach, an ensemble of particles is split into two interacting subsystems that propose updates for each other in a symmetric, alternating fashion. This cross-system interaction ensures that the overall ensemble has $ρ(x)$ as its invariant distribution in both the finite-particle setting and the mean-field limit. The two-system construction reveals that ensemble-chain samplers can be interpreted as finite-$N$ approximations of an ideal mean-field sampler; conversely, it provides a principled recipe to discretize mean-field Langevin dynamics into tractable parallel MCMC algorithms. The framework also connects naturally to adaptive single-chain methods: by replacing particle-based statistics with time-averaged statistics from a single chain, one recovers analogous adaptive dynamics in the long-time limit without requiring a large ensemble. We derive novel two-system versions of both overdamped and underdamped Langevin MCMC samplers within this paradigm. Across synthetic benchmarks and real-world posterior inference tasks, these two-system samplers exhibit significant performance gains over the popular No-U-Turn Sampler, achieving an order of magnitude higher effective sample sizes per gradient evaluation.

Divide, Interact, Sample: The Two-System Paradigm

TL;DR

The paper addresses the challenge of efficiently sampling high-dimensional distributions by unifying ensemble-chain, mean-field, and adaptive MCMC within a single two-system framework that pairs two interacting subsystems to propose updates for one another while preserving the target distribution . By deriving two-system versions of overdamped and underdamped Langevin samplers (MALA and MAKLA) and providing both continuous- and discrete-time realizations, the authors enable parallel, MH-corrected updates with reduced computational costs relative to traditional ensemble methods. Extensive experiments on synthetic targets and posteriordb benchmarks show that adaptive two-system MAKLA variants achieve order-of-magnitude improvements in effective sample size per gradient evaluation compared to NUTS, and maintain robust performance across dimensions, including high-dimensional problems. The framework also clarifies the connections between ensemble, mean-field, and adaptive approaches, offering practical algorithms with strong theoretical guarantees and a scalable path to high-throughput Bayesian computation. The authors release open-source implementations to facilitate adoption and replication of their results.

Abstract

Mean-field, ensemble-chain, and adaptive samplers have historically been viewed as distinct approaches to Monte Carlo sampling. In this paper, we present a unifying {two-system} framework that brings all three under one roof. In our approach, an ensemble of particles is split into two interacting subsystems that propose updates for each other in a symmetric, alternating fashion. This cross-system interaction ensures that the overall ensemble has as its invariant distribution in both the finite-particle setting and the mean-field limit. The two-system construction reveals that ensemble-chain samplers can be interpreted as finite- approximations of an ideal mean-field sampler; conversely, it provides a principled recipe to discretize mean-field Langevin dynamics into tractable parallel MCMC algorithms. The framework also connects naturally to adaptive single-chain methods: by replacing particle-based statistics with time-averaged statistics from a single chain, one recovers analogous adaptive dynamics in the long-time limit without requiring a large ensemble. We derive novel two-system versions of both overdamped and underdamped Langevin MCMC samplers within this paradigm. Across synthetic benchmarks and real-world posterior inference tasks, these two-system samplers exhibit significant performance gains over the popular No-U-Turn Sampler, achieving an order of magnitude higher effective sample sizes per gradient evaluation.

Paper Structure

This paper contains 34 sections, 12 theorems, 114 equations, 4 figures, 7 tables, 5 algorithms.

Key Result

Lemma 1

The $2N$-particle system eq:continuous_cisl has $\rho^{\otimes 2N}$ as its invariant density.

Figures (4)

  • Figure 1: Visualization of the randomized step size distribution. The step size $h = \gamma h_{\max}$ is drawn from a mixture of a point mass at $\gamma = 1$ with weight $\beta \in (0,1)$, and a continuous component $f(x) = 3(1-x)^2$ supported on $(0,1)$, with weight $1 - \beta$. This construction encourages frequent large proposals while allowing occasional small, exploratory steps, improving robustness across varying curvature scales.
  • Figure 2: Median ESS/Grad vs. dimension on 45 posteriordb. Each dot is one posterior; indices (1--45) map to Appendix Table \ref{['tab:posteriordatabase']}. The coupled and adaptive MAKLA variants maintain high, nearly flat ESS/Grad across dimensions, whereas NUTS degrades noticeably as dimension increases.
  • Figure 3: Posterior-mean accuracy vs. dimension on 45 posteriordb. For each of the 45 posteriors, we plot the maximum coordinate-wise absolute relative error (MCARE) in the posterior-mean estimate, $\max_j |\hat{\mu}_j-\mu_j^\star|/\text{std}(\mu_j^*)$, against the dimension; the $y$-axis is on a log scale. Reference means $\mu^\star$ and standard deviation $\text{std}(\mu_j^*)$ are computed from the gold-standard reference draws distributed with posteriordb; indices (1–45) map to Appendix B Table \ref{['tab:posteriordatabase']}.
  • Figure 4: Histogram of parameter-wise $\widehat{R}$ (Gelman--Rubin statistics) values across 45 posteriordb. For each of the 45 models and for every scalar parameter, we compute $\hat{R}$ after warmup and pool all values into a single distribution for three samplers: Coupled MAKLA (purple), 1sys-Adaptive MAKLA (green), and 2sys-Adaptive MAKLA (red). All methods concentrate extremely close to the ideal $\widehat{R}=1$ (note the tight axis range), indicating good cross-chain mixing; lower is better.

Theorems & Definitions (27)

  • Remark 1
  • Lemma 1
  • proof
  • Remark 2
  • Remark 3
  • Corollary 1: Theorem 1.2 Carmona2016
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • ...and 17 more