Table of Contents
Fetching ...

Sequential Monte Carlo approximations of Wasserstein--Fisher--Rao gradient flows

Francesca R. Crucinio, Sahani Pathiraja

TL;DR

The paper tackles efficient sampling from a target distribution $\\pi$ by casting it as minimizing $\\mathrm{KL}(\\mu||\\pi)$ and leveraging gradient-flow structures in Wasserstein, Fisher--Rao, and their combination (WFR). It introduces a stable, sampling-based approximation to the WFR flow by alternating a Wasserstein-like diffusion with a Fisher--Rao reweighting step, implemented via an SMC framework that yields convergence guarantees (Prop. is_wfr and Prop. lp). The authors connect SMC samplers to FR-type flows, explore tempering-based variants, and derive computationally favorable approximations (SMC-ULA, SMC-MALA) while highlighting the trade-offs. Extensive experiments on multimodal and high-dimensional targets show that the proposed SMC-WFR method can outperform birth-death Langevin dynamics and other Monte Carlo methods in convergence speed and robustness, providing practical guidelines for when to deploy WFR-based sampling.

Abstract

We consider the problem of sampling from a probability distribution $π$. It is well known that this can be written as an optimisation problem over the space of probability distribution in which we aim to minimise the Kullback--Leibler divergence from $π$. We consider several partial differential equations (PDEs) whose solution is a minimiser of the Kullback--Leibler divergence from $π$ and connect them to well-known Monte Carlo algorithms. We focus in particular on PDEs obtained by considering the Wasserstein--Fisher--Rao geometry over the space of probabilities and show that these lead to a natural implementation using importance sampling and sequential Monte Carlo. We propose a novel algorithm to approximate the Wasserstein--Fisher--Rao flow of the Kullback--Leibler divergence and conduct an extensive empirical study to identify when these algorithms outperforms other popular Monte Carlo algorithms.

Sequential Monte Carlo approximations of Wasserstein--Fisher--Rao gradient flows

TL;DR

The paper tackles efficient sampling from a target distribution by casting it as minimizing and leveraging gradient-flow structures in Wasserstein, Fisher--Rao, and their combination (WFR). It introduces a stable, sampling-based approximation to the WFR flow by alternating a Wasserstein-like diffusion with a Fisher--Rao reweighting step, implemented via an SMC framework that yields convergence guarantees (Prop. is_wfr and Prop. lp). The authors connect SMC samplers to FR-type flows, explore tempering-based variants, and derive computationally favorable approximations (SMC-ULA, SMC-MALA) while highlighting the trade-offs. Extensive experiments on multimodal and high-dimensional targets show that the proposed SMC-WFR method can outperform birth-death Langevin dynamics and other Monte Carlo methods in convergence speed and robustness, providing practical guidelines for when to deploy WFR-based sampling.

Abstract

We consider the problem of sampling from a probability distribution . It is well known that this can be written as an optimisation problem over the space of probability distribution in which we aim to minimise the Kullback--Leibler divergence from . We consider several partial differential equations (PDEs) whose solution is a minimiser of the Kullback--Leibler divergence from and connect them to well-known Monte Carlo algorithms. We focus in particular on PDEs obtained by considering the Wasserstein--Fisher--Rao geometry over the space of probabilities and show that these lead to a natural implementation using importance sampling and sequential Monte Carlo. We propose a novel algorithm to approximate the Wasserstein--Fisher--Rao flow of the Kullback--Leibler divergence and conduct an extensive empirical study to identify when these algorithms outperforms other popular Monte Carlo algorithms.

Paper Structure

This paper contains 28 sections, 11 theorems, 114 equations, 13 figures, 2 tables, 3 algorithms.

Key Result

Proposition 3.1

Assume that $\pi$ satisfies the log-Sobolev inequality eq:lsi and that there exists $L_\pi>0$ such that $\|\nabla V_\pi(x) - \nabla V_\pi(x')\| \leq L_\pi\|x-x'\|$. If $0\leq \gamma \leq C_{\textrm{LSI}}^{-1}/(4L_\pi^2)$ the approximation of the WFR flow given by eq:w_semigroup--eq:fr_semigroup sati

Figures (13)

  • Figure 2.1: Evolution of $\mathrm{KL}$ along different PDE flows in the 1D Gaussian case with $\mu_0(x) = \mathcal{N}(x; 0, 1)$ and $\pi(x) = \mathcal{N}(x; 20, 0.1)$ (first row), $\pi(x) = \mathcal{N}(x; 1, 5)$ (second row).
  • Figure 4.1: Comparison of evolution of mean, variance and $\mathrm{KL}$ of the exact PDE flows and approximations provided by Algorithm \ref{['alg:smc']} with target distribution the 1D Gaussian $\pi(x) = \mathcal{N}(x; 20, 0.1)$ and initial distribution $\mu_0(x) = \mathcal{N}(x; 0, 1)$. Top row: Fisher--Rao; bottom row: Wasserstein--Fisher--Rao.
  • Figure 5.1: Comparison of approximations of the target $\pi(x) = \sum_{i=1}^4 w_i \mathcal{N}(x; m_i, C_i)$ for the birth-death Langevin algorithms and our SMC approximation. For the latter the colour of the particles corresponds to the weight (brighter corresponds to higher weight). We compare both the joint distribution and the marginals.
  • Figure 5.2: 1D Gaussian mixture (Section \ref{['sec:gm1']}): Evolution of $W_1$ for ULA, SMC tempering and SMC-WFR along iterations averaged over 50 repetitions. The initial distribution is $\mu_0(x) = \mathcal{N}(x; 0, 1)$. Left: $\pi_1$ with $m=6$ is more diffuse than $\mu_0$. Right: $\pi_2$ is more concentrated than $\mu_0$.
  • Figure 5.3: 2D Gaussian mixture: Evolution of $W_1$ and mean squared error (MSE) for the covariance matrix against runtime (first row) and number of iterations (second row) averaged over 50 repetitions. The initial distribution is $\mu_0(x) = \mathcal{N}(x; 0, \textsf{Id})$. We compare ULA, MALA, SMC tempering, SMC-WFR, SMC-ULA and SMC-MALA. SMC-WFR achieves the fastest convergence both in number of iterations and in runtime, ULA and MALA based algorithms fail to identify all the modes.
  • ...and 8 more figures

Theorems & Definitions (18)

  • Proposition 3.1
  • Proposition 3.2
  • Corollary 3.3
  • Corollary 3.4
  • Lemma A.1
  • proof
  • Lemma A.2: Unit time FR as a deterministic time rescaling of infinite time FR.
  • proof
  • proof
  • Proposition C.1
  • ...and 8 more