Table of Contents
Fetching ...

General-purpose post-sampling reweighting method for multimodal target measures

Pierre Monmarché

TL;DR

The paper tackles the problem of obtaining correct relative weights for multiple modes after sampling a multimodal distribution, focusing on a purely post-sampling reweighting step. It formulates a variational-inference-like objective that minimizes the KL divergence between a weighted empirical mixture $\pi(p)=\sum_k p_k\nu_k$ and the target measure $\mu$, with $\nu_k$ representing cluster-local densities. The method optimizes the weights $p$ in the simplex via projected exponential gradient descent, using stochastic estimates of gradients from cluster data, and it provides exact or near-exact solutions in the disjoint-support case. Through extensive numerical experiments on Gaussian mixtures, tempered Langevin, and high-dimensional setups, the approach demonstrates reliable weight recovery and improved estimations, especially when cluster supports are well separated. The proposed post-sampling VI-style reweighting offers a practical, sample-efficient tool for correcting mode weights without generating new samples, enabling accurate observable estimates from existing multimodal samples in moderate dimensions and with multiple modes.

Abstract

When sampling multi-modal probability distributions, correctly estimating the relative probability of each mode, even when the modes have been discovered and locally sampled, remains challenging. We test a simple reweighting scheme designed for this situation, which consists in minimizing (in terms of weights) the Kullback-Leibler divergence of a weighted (regularized) empirical distribution of the samples with respect to the target measure.

General-purpose post-sampling reweighting method for multimodal target measures

TL;DR

The paper tackles the problem of obtaining correct relative weights for multiple modes after sampling a multimodal distribution, focusing on a purely post-sampling reweighting step. It formulates a variational-inference-like objective that minimizes the KL divergence between a weighted empirical mixture and the target measure , with representing cluster-local densities. The method optimizes the weights in the simplex via projected exponential gradient descent, using stochastic estimates of gradients from cluster data, and it provides exact or near-exact solutions in the disjoint-support case. Through extensive numerical experiments on Gaussian mixtures, tempered Langevin, and high-dimensional setups, the approach demonstrates reliable weight recovery and improved estimations, especially when cluster supports are well separated. The proposed post-sampling VI-style reweighting offers a practical, sample-efficient tool for correcting mode weights without generating new samples, enabling accurate observable estimates from existing multimodal samples in moderate dimensions and with multiple modes.

Abstract

When sampling multi-modal probability distributions, correctly estimating the relative probability of each mode, even when the modes have been discovered and locally sampled, remains challenging. We test a simple reweighting scheme designed for this situation, which consists in minimizing (in terms of weights) the Kullback-Leibler divergence of a weighted (regularized) empirical distribution of the samples with respect to the target measure.
Paper Structure (22 sections, 1 theorem, 43 equations, 8 figures)

This paper contains 22 sections, 1 theorem, 43 equations, 8 figures.

Key Result

Proposition 1

Assume that there exists $C>0$ such that for all $x\in\mathbb R^d$ and $k\in\llbracket 1,K\rrbracket$, Then there exists $L>0$ such that, for any initial condition $r^0\in\Delta_K$, considering $(p^m)_{m\in\mathbb N}$ and $(q^m)_{m\in\mathbb N}$ respectively given by eq1 and eq:descente-approx with $p^0 = q^0=r^0$, for all $m\in\mathbb N$,

Figures (8)

  • Figure 1: Bias estimated on $M=48$ runs, for various values of $(a,d)$.
  • Figure 2: Variance estimated on $M=48$ runs, for various values of $(a,d)$.
  • Figure 3: Evolution of the estimator along the gradient descent. The blue line (resp. orange) is initialized at \ref{['eq:p*no-overlap']} (resp. 0.5), and the dashed green line is the truth $p^*_{\mathrm{true}}=0.7$.
  • Figure 4: Bias (plus standard deviation in red) as a function of the number $N$ of samples per mode (in log-scale), estimated on $M=100$ runs.
  • Figure 5: Mixture of $K=10$ Gaussian distributions. Left: histograms of the first coordinate $X_1$ in each cluster, weighted by $p^*_{true}$ (given in the label). Right: evolution of $(p_1^m,\dots,p_4^m)$ along the iterations $m$ of the gradient descent, either initialized at \ref{['eq:p*no-overlap']} (solid line) or $1/K$ (dots), with the true value $p^*_{true}$ (dashed line) for reference. The same color is used for the same coordinates (i.e. blue for the first coordinate, etc.).
  • ...and 3 more figures

Theorems & Definitions (5)

  • Remark 1: Case without overlap
  • Remark 2: Dimension reduction
  • Proposition 1
  • proof
  • Remark 3: Dimension reduction bis