Table of Contents
Fetching ...

Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions

Dongze Wu, Yao Xie

TL;DR

Annealing Flow (AF) introduces a Continuous Normalizing Flow-based sampler guided by a dynamic Optimal Transport objective and annealing to tackle high-dimensional, multi-modal distributions. AF decomposes the target transport into intermediate densities and learns velocity fields through a neural ODE framework, with a KL-based objective augmented by a dynamic Wasserstein-2 regularization. Theoretical results show the infinitesimal optimal velocity equals the score difference between consecutive annealing densities, connecting AF to Stein operators and Wasserstein gradient flow; empirically AF outperforms state-of-the-art NF and MCMC-based methods across challenging distributions, often with far fewer time steps and offline training. The paper also develops Importance Flow, combining density-ratio estimation with AF to enable low-variance importance sampling and potential distribution-free extensions. Overall, AF provides a scalable, stable, and efficient approach to sampling in high-dimensional multimodal settings, with practical implications for Bayesian inference and statistical physics.

Abstract

Sampling from high-dimensional, multi-modal distributions remains a fundamental challenge across domains such as statistical Bayesian inference and physics-based machine learning. In this paper, we propose Annealing Flow (AF), a method built on Continuous Normalizing Flow (CNF) for sampling from high-dimensional and multi-modal distributions. AF is trained with a dynamic Optimal Transport (OT) objective incorporating Wasserstein regularization, and guided by annealing procedures, facilitating effective exploration of modes in high-dimensional spaces. Compared to recent NF methods, AF greatly improves training efficiency and stability, with minimal reliance on MC assistance. We demonstrate the superior performance of AF compared to state-of-the-art methods through experiments on various challenging distributions and real-world datasets, particularly in high-dimensional and multi-modal settings. We also highlight AF potential for sampling the least favorable distributions.

Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions

TL;DR

Annealing Flow (AF) introduces a Continuous Normalizing Flow-based sampler guided by a dynamic Optimal Transport objective and annealing to tackle high-dimensional, multi-modal distributions. AF decomposes the target transport into intermediate densities and learns velocity fields through a neural ODE framework, with a KL-based objective augmented by a dynamic Wasserstein-2 regularization. Theoretical results show the infinitesimal optimal velocity equals the score difference between consecutive annealing densities, connecting AF to Stein operators and Wasserstein gradient flow; empirically AF outperforms state-of-the-art NF and MCMC-based methods across challenging distributions, often with far fewer time steps and offline training. The paper also develops Importance Flow, combining density-ratio estimation with AF to enable low-variance importance sampling and potential distribution-free extensions. Overall, AF provides a scalable, stable, and efficient approach to sampling in high-dimensional multimodal settings, with practical implications for Bayesian inference and statistical physics.

Abstract

Sampling from high-dimensional, multi-modal distributions remains a fundamental challenge across domains such as statistical Bayesian inference and physics-based machine learning. In this paper, we propose Annealing Flow (AF), a method built on Continuous Normalizing Flow (CNF) for sampling from high-dimensional and multi-modal distributions. AF is trained with a dynamic Optimal Transport (OT) objective incorporating Wasserstein regularization, and guided by annealing procedures, facilitating effective exploration of modes in high-dimensional spaces. Compared to recent NF methods, AF greatly improves training efficiency and stability, with minimal reliance on MC assistance. We demonstrate the superior performance of AF compared to state-of-the-art methods through experiments on various challenging distributions and real-world datasets, particularly in high-dimensional and multi-modal settings. We also highlight AF potential for sampling the least favorable distributions.
Paper Structure (35 sections, 3 theorems, 78 equations, 14 figures, 12 tables, 3 algorithms)

This paper contains 35 sections, 3 theorems, 78 equations, 14 figures, 12 tables, 3 algorithms.

Key Result

Proposition 3.1

Given the samples from $f_{k-1}$, we have: up to a constant $c$ that is independent of $v_{k}(x(s),s)$.

Figures (14)

  • Figure 1: Illustrations and comparisons of annealing trajectories: Annealing Flow (based on optimal transport maps) versus other methods based on diffusion and score matching.
  • Figure 2: $\beta_0 = 0$
  • Figure 3: $\beta_1 = 1/3$
  • Figure 4: $\beta_2 = 2/3$
  • Figure 5: $\beta_3 = 1$
  • ...and 9 more figures

Theorems & Definitions (3)

  • Proposition 3.1
  • Lemma 3.2
  • Theorem 3.3