General-purpose post-sampling reweighting method for multimodal target measures

Pierre Monmarché

General-purpose post-sampling reweighting method for multimodal target measures

Pierre Monmarché

TL;DR

The paper tackles the problem of obtaining correct relative weights for multiple modes after sampling a multimodal distribution, focusing on a purely post-sampling reweighting step. It formulates a variational-inference-like objective that minimizes the KL divergence between a weighted empirical mixture $\pi(p)=\sum_k p_k\nu_k$ and the target measure $\mu$, with $\nu_k$ representing cluster-local densities. The method optimizes the weights $p$ in the simplex via projected exponential gradient descent, using stochastic estimates of gradients from cluster data, and it provides exact or near-exact solutions in the disjoint-support case. Through extensive numerical experiments on Gaussian mixtures, tempered Langevin, and high-dimensional setups, the approach demonstrates reliable weight recovery and improved estimations, especially when cluster supports are well separated. The proposed post-sampling VI-style reweighting offers a practical, sample-efficient tool for correcting mode weights without generating new samples, enabling accurate observable estimates from existing multimodal samples in moderate dimensions and with multiple modes.

Abstract

When sampling multi-modal probability distributions, correctly estimating the relative probability of each mode, even when the modes have been discovered and locally sampled, remains challenging. We test a simple reweighting scheme designed for this situation, which consists in minimizing (in terms of weights) the Kullback-Leibler divergence of a weighted (regularized) empirical distribution of the samples with respect to the target measure.

General-purpose post-sampling reweighting method for multimodal target measures

TL;DR

and the target measure

, with

representing cluster-local densities. The method optimizes the weights

in the simplex via projected exponential gradient descent, using stochastic estimates of gradients from cluster data, and it provides exact or near-exact solutions in the disjoint-support case. Through extensive numerical experiments on Gaussian mixtures, tempered Langevin, and high-dimensional setups, the approach demonstrates reliable weight recovery and improved estimations, especially when cluster supports are well separated. The proposed post-sampling VI-style reweighting offers a practical, sample-efficient tool for correcting mode weights without generating new samples, enabling accurate observable estimates from existing multimodal samples in moderate dimensions and with multiple modes.

Abstract

Paper Structure (22 sections, 1 theorem, 43 equations, 8 figures)

This paper contains 22 sections, 1 theorem, 43 equations, 8 figures.

Introduction
Sampling multimodal distributions.
Our approach.
Organization of the work.
Relation with other methods.
The method
Prerequisite
The optimization problem
Parametrization.
Objective function.
Optimization.
Analysis in simple situations
Case with full support
Case with disjoint supports
Numerical experiments
...and 7 more sections

Key Result

Proposition 1

Assume that there exists $C>0$ such that for all $x\in\mathbb R^d$ and $k\in\llbracket 1,K\rrbracket$, Then there exists $L>0$ such that, for any initial condition $r^0\in\Delta_K$, considering $(p^m)_{m\in\mathbb N}$ and $(q^m)_{m\in\mathbb N}$ respectively given by eq1 and eq:descente-approx with $p^0 = q^0=r^0$, for all $m\in\mathbb N$,

Figures (8)

Figure 1: Bias estimated on $M=48$ runs, for various values of $(a,d)$.
Figure 2: Variance estimated on $M=48$ runs, for various values of $(a,d)$.
Figure 3: Evolution of the estimator along the gradient descent. The blue line (resp. orange) is initialized at \ref{['eq:p*no-overlap']} (resp. 0.5), and the dashed green line is the truth $p^*_{\mathrm{true}}=0.7$.
Figure 4: Bias (plus standard deviation in red) as a function of the number $N$ of samples per mode (in log-scale), estimated on $M=100$ runs.
Figure 5: Mixture of $K=10$ Gaussian distributions. Left: histograms of the first coordinate $X_1$ in each cluster, weighted by $p^*_{true}$ (given in the label). Right: evolution of $(p_1^m,\dots,p_4^m)$ along the iterations $m$ of the gradient descent, either initialized at \ref{['eq:p*no-overlap']} (solid line) or $1/K$ (dots), with the true value $p^*_{true}$ (dashed line) for reference. The same color is used for the same coordinates (i.e. blue for the first coordinate, etc.).
...and 3 more figures

Theorems & Definitions (5)

Remark 1: Case without overlap
Remark 2: Dimension reduction
Proposition 1
proof
Remark 3: Dimension reduction bis

General-purpose post-sampling reweighting method for multimodal target measures

TL;DR

Abstract

General-purpose post-sampling reweighting method for multimodal target measures

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)