Improving the evaluation of samplers on multi-modal targets
Louis Grenioux, Maxence Noble, Marylou Gabrié
TL;DR
The paper addresses the challenge of evaluating samplers on multi-modal targets, where mode discovery and accurate mode-weight estimation are hard in high dimensions. It proposes a synthetic, dimension- and separation-controlled benchmark based on a bi-modal Gaussian mixture and introduces a mode-weight metric to quantify sampler performance, enabling interpretable diagnostics across methods. Across local MCMC, importance sampling, variational inference, annealed methods, and diffusion-based approaches, the study finds that annealing-based samplers (SMC, Replica Exchange) robustly recover mode proportions in moderate settings, while diffusion-based methods (SLIPS, DDS) show promise with careful tuning, whereas vanilla MCMC/IS/VI struggle as separation and dimension grow. The framework offers practical insights for diagnosing sampler strengths and guiding robust, scalable development for multi-modal sampling tasks.
Abstract
Addressing multi-modality constitutes one of the major challenges of sampling. In this reflection paper, we advocate for a more systematic evaluation of samplers towards two sources of difficulty that are mode separation and dimension. For this, we propose a synthetic experimental setting that we illustrate on a selection of samplers, focusing on the challenging criterion of recovery of the mode relative importance. These evaluations are crucial to diagnose the potential of samplers to handle multi-modality and therefore to drive progress in the field.
