Table of Contents
Fetching ...

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

Jannis Chemseddine, Christian Wald, Richard Duong, Gabriele Steidl

TL;DR

An interpolation is proposed which corresponds to the Wasserstein gradient flow of the Kullback-Leibler divergence related to Langevin dynamics and is demonstrated by numerical examples that provides a well-behaved flow field which successfully solves the above sampling task.

Abstract

We deal with the task of sampling from an unnormalized Boltzmann density $ρ_D$ by learning a Boltzmann curve given by energies $f_t$ starting in a simple density $ρ_Z$. First, we examine conditions under which Fisher-Rao flows are absolutely continuous in the Wasserstein geometry. Second, we address specific interpolations $f_t$ and the learning of the related density/velocity pairs $(ρ_t,v_t)$. It was numerically observed that the linear interpolation, which requires only a parametrization of the velocity field $v_t$, suffers from a "teleportation-of-mass" issue. Using tools from the Wasserstein geometry, we give an analytical example, where we can precisely measure the explosion of the velocity field. Inspired by Máté and Fleuret, who parametrize both $f_t$ and $v_t$, we propose an interpolation which parametrizes only $f_t$ and fixes an appropriate $v_t$. This corresponds to the Wasserstein gradient flow of the Kullback-Leibler divergence related to Langevin dynamics. We demonstrate by numerical examples that our model provides a well-behaved flow field which successfully solves the above sampling task.

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

TL;DR

An interpolation is proposed which corresponds to the Wasserstein gradient flow of the Kullback-Leibler divergence related to Langevin dynamics and is demonstrated by numerical examples that provides a well-behaved flow field which successfully solves the above sampling task.

Abstract

We deal with the task of sampling from an unnormalized Boltzmann density by learning a Boltzmann curve given by energies starting in a simple density . First, we examine conditions under which Fisher-Rao flows are absolutely continuous in the Wasserstein geometry. Second, we address specific interpolations and the learning of the related density/velocity pairs . It was numerically observed that the linear interpolation, which requires only a parametrization of the velocity field , suffers from a "teleportation-of-mass" issue. Using tools from the Wasserstein geometry, we give an analytical example, where we can precisely measure the explosion of the velocity field. Inspired by Máté and Fleuret, who parametrize both and , we propose an interpolation which parametrizes only and fixes an appropriate . This corresponds to the Wasserstein gradient flow of the Kullback-Leibler divergence related to Langevin dynamics. We demonstrate by numerical examples that our model provides a well-behaved flow field which successfully solves the above sampling task.
Paper Structure (29 sections, 11 theorems, 86 equations, 6 figures, 4 tables)

This paper contains 29 sections, 11 theorems, 86 equations, 6 figures, 4 tables.

Key Result

Theorem 1

Assume that $\boldsymbol{\rho}$ determined by $\int_{[0,1]\times\mathbb{R}^d} f(t,x) \, \mathrm{d} \boldsymbol{\rho} = \int_0^1 \int_{\mathbb{R}^d} f(t,x) \, \mathrm{d} \rho_t \mathrm{d} t$ satisfies a so-called partial Poincare inequality, see Definition poincare, for some $K >0$, and that $\frac{1

Figures (6)

  • Figure 1: Evolution of the probability densities $\rho_t \propto \rho_0^{1-t}\rho_1^t$, where $\rho_0 \propto e^{-|x|}$ and $\rho_1 \propto e^{-2\min\{|x|, |x-m|\}}$ for $m =50$. For different values of $m$ see \ref{['fig:density_evolution-appendix']}.
  • Figure 2: In the first three figures, the evolution of $\|v_t\|_{L_2(\rho_t)}^2$ belonging to the path in Figure \ref{['fig:density_evolution']} is depicted for different values of the second mode $m>0$. For larger $m$, the norm $\|v_t\|_{L_2(\rho_t)}^2$ approaches the limit $\|v_1\|_{L_2(\rho_1)}^2$ at later times, hence with a steeper slope. The last figure shows the log scale of $\|v_1\|_{L_2(\rho_1)}^2$ depending on $m$. It demonstrates that $\|v_1\|_{L_2(\rho_1)}$ roughly grows exponentially with $m$.
  • Figure 3: Results for a Gaussian Mixture Model with 40 modes. For the linear and learned interpolation we showed the results for $\sigma=30$, for the gradient flow interpretation we used $\sigma=1$.
  • Figure 4: Evolution of probability densities $\rho_t$ for $m \in \{1,5,15,50\}$.
  • Figure 5: Results for the linear interpolation for different values of $\sigma$.
  • ...and 1 more figures

Theorems & Definitions (30)

  • Theorem 1
  • Theorem 2
  • Remark 3
  • Definition A.1
  • Definition A.2
  • Lemma A.3
  • proof
  • Definition A.4
  • Lemma A.5
  • proof
  • ...and 20 more