Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

Jannis Chemseddine; Christian Wald; Richard Duong; Gabriele Steidl

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

Jannis Chemseddine, Christian Wald, Richard Duong, Gabriele Steidl

TL;DR

An interpolation is proposed which corresponds to the Wasserstein gradient flow of the Kullback-Leibler divergence related to Langevin dynamics and is demonstrated by numerical examples that provides a well-behaved flow field which successfully solves the above sampling task.

Abstract

We deal with the task of sampling from an unnormalized Boltzmann density $ρ_D$ by learning a Boltzmann curve given by energies $f_t$ starting in a simple density $ρ_Z$. First, we examine conditions under which Fisher-Rao flows are absolutely continuous in the Wasserstein geometry. Second, we address specific interpolations $f_t$ and the learning of the related density/velocity pairs $(ρ_t,v_t)$. It was numerically observed that the linear interpolation, which requires only a parametrization of the velocity field $v_t$, suffers from a "teleportation-of-mass" issue. Using tools from the Wasserstein geometry, we give an analytical example, where we can precisely measure the explosion of the velocity field. Inspired by Máté and Fleuret, who parametrize both $f_t$ and $v_t$, we propose an interpolation which parametrizes only $f_t$ and fixes an appropriate $v_t$. This corresponds to the Wasserstein gradient flow of the Kullback-Leibler divergence related to Langevin dynamics. We demonstrate by numerical examples that our model provides a well-behaved flow field which successfully solves the above sampling task.

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

TL;DR

Abstract

We deal with the task of sampling from an unnormalized Boltzmann density

by learning a Boltzmann curve given by energies

starting in a simple density

. First, we examine conditions under which Fisher-Rao flows are absolutely continuous in the Wasserstein geometry. Second, we address specific interpolations

and the learning of the related density/velocity pairs

. It was numerically observed that the linear interpolation, which requires only a parametrization of the velocity field

, suffers from a "teleportation-of-mass" issue. Using tools from the Wasserstein geometry, we give an analytical example, where we can precisely measure the explosion of the velocity field. Inspired by Máté and Fleuret, who parametrize both

and

, we propose an interpolation which parametrizes only

and fixes an appropriate

. This corresponds to the Wasserstein gradient flow of the Kullback-Leibler divergence related to Langevin dynamics. We demonstrate by numerical examples that our model provides a well-behaved flow field which successfully solves the above sampling task.

Paper Structure (29 sections, 11 theorems, 86 equations, 6 figures, 4 tables)

This paper contains 29 sections, 11 theorems, 86 equations, 6 figures, 4 tables.

Introduction
Contributions.
Related Work.
Wasserstein Flows of Boltzmann Densities
General Background
Wasserstein meets Fisher-Rao Flows
Neural Sampling from Boltzmann Densities
Linear & Learned Interpolation
Linear interpolation.
Learned Interpolation.
Gradient Flow Interpolation
Relation to Wasserstein gradient flows.
Experiments
Gaussian Mixture Model.
Many Well Distribution.
...and 14 more sections

Key Result

Theorem 1

Assume that $\boldsymbol{\rho}$ determined by $\int_{[0,1]\times\mathbb{R}^d} f(t,x) \, \mathrm{d} \boldsymbol{\rho} = \int_0^1 \int_{\mathbb{R}^d} f(t,x) \, \mathrm{d} \rho_t \mathrm{d} t$ satisfies a so-called partial Poincare inequality, see Definition poincare, for some $K >0$, and that $\frac{1

Figures (6)

Figure 1: Evolution of the probability densities $\rho_t \propto \rho_0^{1-t}\rho_1^t$, where $\rho_0 \propto e^{-|x|}$ and $\rho_1 \propto e^{-2\min\{|x|, |x-m|\}}$ for $m =50$. For different values of $m$ see \ref{['fig:density_evolution-appendix']}.
Figure 2: In the first three figures, the evolution of $\|v_t\|_{L_2(\rho_t)}^2$ belonging to the path in Figure \ref{['fig:density_evolution']} is depicted for different values of the second mode $m>0$. For larger $m$, the norm $\|v_t\|_{L_2(\rho_t)}^2$ approaches the limit $\|v_1\|_{L_2(\rho_1)}^2$ at later times, hence with a steeper slope. The last figure shows the log scale of $\|v_1\|_{L_2(\rho_1)}^2$ depending on $m$. It demonstrates that $\|v_1\|_{L_2(\rho_1)}$ roughly grows exponentially with $m$.
Figure 3: Results for a Gaussian Mixture Model with 40 modes. For the linear and learned interpolation we showed the results for $\sigma=30$, for the gradient flow interpretation we used $\sigma=1$.
Figure 4: Evolution of probability densities $\rho_t$ for $m \in \{1,5,15,50\}$.
Figure 5: Results for the linear interpolation for different values of $\sigma$.
...and 1 more figures

Theorems & Definitions (30)

Theorem 1
Theorem 2
Remark 3
Definition A.1
Definition A.2
Lemma A.3
proof
Definition A.4
Lemma A.5
proof
...and 20 more

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

TL;DR

Abstract

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (30)