Table of Contents
Fetching ...

On Sampling with Approximate Transport Maps

Louis Grenioux, Alain Durmus, Éric Moulines, Marylou Gabrié

TL;DR

The paper studies sampling with approximate transport maps built from Normalizing Flows and compares three NF enhanced strategies: neural-IS, flow-MCMC, and neutra-MCMC. It shows that flow-based and flow-MCMC approaches excel at multimodal targets up to moderate dimensions, while neutra-MCMC is more robust for unimodal targets but struggles to traverse energy barriers between modes. A new mixing time bound for independent Metropolis-Hastings under a local Lipschitz condition on log weights and strong convexity is derived, revealing dimension-free behavior when the flow quality constant is controlled. Real-world benchmarks in molecular systems, sparse logistic regression, and field theory corroborate synthetic findings and highlight practical tradeoffs in wall-clock time and parallelizability. Overall, NF enabled samplers offer strong advantages when matched to target geometry, with hybrid strategies offering a compelling path for high dimensional multimodal inference.

Abstract

Transport maps can ease the sampling of distributions with non-trivial geometries by transforming them into distributions that are easier to handle. The potential of this approach has risen with the development of Normalizing Flows (NF) which are maps parameterized with deep neural networks trained to push a reference distribution towards a target. NF-enhanced samplers recently proposed blend (Markov chain) Monte Carlo methods with either (i) proposal draws from the flow or (ii) a flow-based reparametrization. In both cases, the quality of the learned transport conditions performance. The present work clarifies for the first time the relative strengths and weaknesses of these two approaches. Our study concludes that multimodal targets can be reliably handled with flow-based proposals up to moderately high dimensions. In contrast, methods relying on reparametrization struggle with multimodality but are more robust otherwise in high-dimensional settings and under poor training. To further illustrate the influence of target-proposal adequacy, we also derive a new quantitative bound for the mixing time of the Independent Metropolis-Hastings sampler.

On Sampling with Approximate Transport Maps

TL;DR

The paper studies sampling with approximate transport maps built from Normalizing Flows and compares three NF enhanced strategies: neural-IS, flow-MCMC, and neutra-MCMC. It shows that flow-based and flow-MCMC approaches excel at multimodal targets up to moderate dimensions, while neutra-MCMC is more robust for unimodal targets but struggles to traverse energy barriers between modes. A new mixing time bound for independent Metropolis-Hastings under a local Lipschitz condition on log weights and strong convexity is derived, revealing dimension-free behavior when the flow quality constant is controlled. Real-world benchmarks in molecular systems, sparse logistic regression, and field theory corroborate synthetic findings and highlight practical tradeoffs in wall-clock time and parallelizability. Overall, NF enabled samplers offer strong advantages when matched to target geometry, with hybrid strategies offering a compelling path for high dimensional multimodal inference.

Abstract

Transport maps can ease the sampling of distributions with non-trivial geometries by transforming them into distributions that are easier to handle. The potential of this approach has risen with the development of Normalizing Flows (NF) which are maps parameterized with deep neural networks trained to push a reference distribution towards a target. NF-enhanced samplers recently proposed blend (Markov chain) Monte Carlo methods with either (i) proposal draws from the flow or (ii) a flow-based reparametrization. In both cases, the quality of the learned transport conditions performance. The present work clarifies for the first time the relative strengths and weaknesses of these two approaches. Our study concludes that multimodal targets can be reliably handled with flow-based proposals up to moderately high dimensions. In contrast, methods relying on reparametrization struggle with multimodality but are more robust otherwise in high-dimensional settings and under poor training. To further illustrate the influence of target-proposal adequacy, we also derive a new quantitative bound for the mixing time of the Independent Metropolis-Hastings sampler.
Paper Structure (67 sections, 7 theorems, 91 equations, 20 figures, 9 tables)

This paper contains 67 sections, 7 theorems, 91 equations, 20 figures, 9 tables.

Key Result

Proposition 3.1

Let $\pi = \mathcal{N}(-a,\sigma^2)/2 + \mathcal{N}(a, \sigma^2) / 2$ with $a > 0$, $\sigma > 0$ and $\rho = \mathcal{N}(0, 1)$. The unique increasing flow mapping $\pi$ to $\rho$ denoted $T_{\pi,\rho}$ verifies that

Figures (20)

  • Figure 1: (Left)Push-backwards $\lambda_{T_t}^{\rho}$ and push-forwards $\lambda_{T^{-1}_t}^{\pi}$ as a function of the flow imperfection parameter $t$.(Right) Sliced TV distances of different samplers depending on the quality of the flow $t$, using 256 chains of length 1400 initialized with draws from NF with $T_t$. neural-IS was evaluated with 14000 samples. Results were qualitatively unchanged for $d=16,32,64,256$.
  • Figure 2: (Left) Example chains of NF-enhanced walkers with a 2d target mixture of 4 Gaussians. The 128-step MCMC chain is colored according to the closest mode in the data space (bottom row) with corresponding location in the latent space (top row). The complex geometry of the push-backward $\lambda_{T_\alpha}^{\pi}$ hinders the mixing of local-update algorithms. MALA's step-size was chosen to reach 75% of acceptance. (Middle)Median squared error of the histograms of visited modes of 4 Gaussians per chain against the perfect uniform histogram as a function of dimension. 512 chains of 1000-steps on average were used. (Right) Sliced total variation in sampling the Banana distribution in increasing dimension using a RealNVP. 128 chains of 1024-steps were used.
  • Figure 3: Sampled configurations of alanine-dipeptide projected from 66 Cartesian coordinates to 2 dihedral angles $\phi$ and $\psi$ (see App. \ref{['app:aldp']}). (Top) Samples from the flow (left) and samples from a single MCMC chain of the different NF-samplers are shown as bright-colored points on colored background displaying the log histogram of exact samples at $T=300 K$ obtained by a Replica Exchange Molecular Dynamics simulation of vincent_stimper_2022_6993124. (Bottom) Log-histograms of samples from the flow (left) and from 256 MCMC chains started at the same location.
  • Figure 4: (Left) Sampled $\phi^4$ configurations in dimension 128. (Right) Within-mode Sliced TV as a function of dimension.
  • Figure 5: Flow $T_{\mu,\nu}$ in 1D - (Left) Map $T_{\mu,\nu}$ from the latent space (on the x-axis) to the data space (on the y-axis). Each pair of dotted lines highlight a mode in the latent space while the horinzontal line show the mode in the data space. (Middle) Kernel density estimation of the push forward of $\mu$ through the flow $T_{\mu,\nu}$(Right) Kernel density estimation of the push forward of the base ($\mathcal{N}(0,1)$) through the flow a smooth cubic spline approximation of $T_{\mu,\nu}$
  • ...and 15 more figures

Theorems & Definitions (11)

  • Proposition 3.1
  • Theorem 4.3: Explicit mixing time bounds for IMH
  • Theorem 5.1: Corollary 1.5 Lovasz1993
  • Theorem 5.4: Conductance lower bound for IMH
  • proof
  • Corollary 5.5: Mixing time upper bound for IMH
  • proof
  • Theorem 5.6
  • proof
  • Corollary 5.8
  • ...and 1 more