Table of Contents
Fetching ...

An operator splitting analysis of Wasserstein--Fisher--Rao gradient flows

Francesca Romana Crucinio, Sahani Pathiraja

TL;DR

The paper analyzes Wasserstein-Fisher-Rao gradient flows for sampling, focusing on exact operator splitting of the W and FR components. It derives variational formulae for two canonical splitting orders (W-FR and FR-W), establishes Gaussian-case covariances and KL decay analyses, and proves log-concavity preservation for the WFR flow under suitable conditions. A key finding is that, with an appropriate step size, splitting can accelerate convergence to the target distribution beyond the exact WFR dynamics, as quantified by KL and Jeffreys divergence rates; the improvement depends on initialization, target geometry, and operator ordering. The results provide theoretical guidance for designing split-flow samplers that exploit splitting bias to achieve faster convergence in practice, especially in strongly log-concave settings.

Abstract

Wasserstein-Fisher-Rao (WFR) gradient flows have been recently proposed as a powerful sampling tool that combines the advantages of pure Wasserstein (W) and pure Fisher-Rao (FR) gradient flows. Existing algorithmic developments implicitly make use of operator splitting techniques to numerically approximate the WFR partial differential equation, whereby the W flow is evaluated over a given step size and then the FR flow (or vice versa). This works investigates the impact of the order in which the W and FR operator are evaluated and aims to provide a quantitative analysis. Somewhat surprisingly, we show that with a judicious choice of step size and operator ordering, the split scheme can converge to the target distribution faster than the exact WFR flow (in terms of model time). We obtain variational formulae describing the evolution over one time step of both sequential splitting schemes and investigate in which settings the W-FR split should be preferred to the FR-W split. As a step towards this goal we show that the WFR gradient flow preserves log-concavity and obtain the first sharp decay bound for WFR.

An operator splitting analysis of Wasserstein--Fisher--Rao gradient flows

TL;DR

The paper analyzes Wasserstein-Fisher-Rao gradient flows for sampling, focusing on exact operator splitting of the W and FR components. It derives variational formulae for two canonical splitting orders (W-FR and FR-W), establishes Gaussian-case covariances and KL decay analyses, and proves log-concavity preservation for the WFR flow under suitable conditions. A key finding is that, with an appropriate step size, splitting can accelerate convergence to the target distribution beyond the exact WFR dynamics, as quantified by KL and Jeffreys divergence rates; the improvement depends on initialization, target geometry, and operator ordering. The results provide theoretical guidance for designing split-flow samplers that exploit splitting bias to achieve faster convergence in practice, especially in strongly log-concave settings.

Abstract

Wasserstein-Fisher-Rao (WFR) gradient flows have been recently proposed as a powerful sampling tool that combines the advantages of pure Wasserstein (W) and pure Fisher-Rao (FR) gradient flows. Existing algorithmic developments implicitly make use of operator splitting techniques to numerically approximate the WFR partial differential equation, whereby the W flow is evaluated over a given step size and then the FR flow (or vice versa). This works investigates the impact of the order in which the W and FR operator are evaluated and aims to provide a quantitative analysis. Somewhat surprisingly, we show that with a judicious choice of step size and operator ordering, the split scheme can converge to the target distribution faster than the exact WFR flow (in terms of model time). We obtain variational formulae describing the evolution over one time step of both sequential splitting schemes and investigate in which settings the W-FR split should be preferred to the FR-W split. As a step towards this goal we show that the WFR gradient flow preserves log-concavity and obtain the first sharp decay bound for WFR.

Paper Structure

This paper contains 20 sections, 14 theorems, 188 equations, 4 figures.

Key Result

Proposition 2.2

Sequential split (W-FR) PDE: Let us denote $\nu_\gamma = \nu_1(x;\gamma)$ and let Assumption ass:lsi hold. The variation of one sequential split step in the order W-FR eq:sequential_split of size $\gamma$ corresponds to the PDE where $g$ is as defined in eq:geqn.

Figures (4)

  • Figure 3.1: Difference in $\mathrm{KL}$ for a single time step $\gamma$ for W-FR split and FR-W on 1D Gaussians. Left: Target more diffuse than initial distribution ($m_\pi = 20, C_\pi = 100, m_0 = 0, C_0 = 1$). Right: Target more concentrated than initial distribution ($m_\pi = 20, C_\pi = 1, m_0 = 0, C_0 = 100$).
  • Figure 3.2: Left: Ratio of KL from $n$-step W-FR scheme to KL from exact WFR as a function of $t = n \times \gamma$, $\pi$ is a 10D Gaussian (see Appendix B us_neurips for details) and $\gamma = 0.7$. The horizontal black line corresponds to \ref{['eq:KLsplitratio']}. For reference, $\mathrm{KL}(\mu_{4.2}||\pi) = 9.7$. Right: Same as left but for FR-W scheme.
  • Figure 4.1: Comparison of the log-concavity constant predicted by Theorem \ref{['theo:logconc']} and the true log-concavity constant obtained from \ref{['eq:WFRGaussexactCt']} for a 1D Gaussian with $\mu_0(x) = \mathcal{N}(x;, 0, 1)$ and $\pi(x) = \mathcal{N}(x;, 0, 100)$ (left), $\pi(x) = \mathcal{N}(x;, 0, 5)$ (middle), $\pi(x) = \mathcal{N}(x;, 0, 2.1)$ (right). For $C_\pi\leq2$ the assumptions of Theorem \ref{['theo:logconc']} are not satisfied.
  • Figure 5.1: Comparison of exact KL decay (left) and symmetrised KL decay (right) for a 1D Gaussian with $m_\pi = 20, C_\pi = 100, m_0 = 0, C_0 = 1$. Left plot: We compare the exact decay of KL from Proposition \ref{['prop:KLcomp']} with the rates in \ref{['eq:rate_wfr']} and \ref{['eq:sharp_wfr']} with $\delta = 0.1, t_0 = 6.9$. Right plot: We compare the exact decay of the symmetrised KL obtained using the mean and covariance evolution in \ref{['eq:WFRGaussexactCt']}--\ref{['eq:WFRGaussexactmt']} with the rate in Proposition \ref{['prop:decayJexactWFR']}.

Theorems & Definitions (27)

  • Remark 2.1
  • Proposition 2.2
  • proof
  • Proposition 2.3
  • proof
  • Lemma 3.1
  • Proposition 3.2
  • Lemma 4.1
  • Theorem 4.1
  • Proposition 5.1
  • ...and 17 more