Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions
Dongze Wu, Yao Xie
TL;DR
Annealing Flow (AF) introduces a Continuous Normalizing Flow-based sampler guided by a dynamic Optimal Transport objective and annealing to tackle high-dimensional, multi-modal distributions. AF decomposes the target transport into intermediate densities and learns velocity fields through a neural ODE framework, with a KL-based objective augmented by a dynamic Wasserstein-2 regularization. Theoretical results show the infinitesimal optimal velocity equals the score difference between consecutive annealing densities, connecting AF to Stein operators and Wasserstein gradient flow; empirically AF outperforms state-of-the-art NF and MCMC-based methods across challenging distributions, often with far fewer time steps and offline training. The paper also develops Importance Flow, combining density-ratio estimation with AF to enable low-variance importance sampling and potential distribution-free extensions. Overall, AF provides a scalable, stable, and efficient approach to sampling in high-dimensional multimodal settings, with practical implications for Bayesian inference and statistical physics.
Abstract
Sampling from high-dimensional, multi-modal distributions remains a fundamental challenge across domains such as statistical Bayesian inference and physics-based machine learning. In this paper, we propose Annealing Flow (AF), a method built on Continuous Normalizing Flow (CNF) for sampling from high-dimensional and multi-modal distributions. AF is trained with a dynamic Optimal Transport (OT) objective incorporating Wasserstein regularization, and guided by annealing procedures, facilitating effective exploration of modes in high-dimensional spaces. Compared to recent NF methods, AF greatly improves training efficiency and stability, with minimal reliance on MC assistance. We demonstrate the superior performance of AF compared to state-of-the-art methods through experiments on various challenging distributions and real-world datasets, particularly in high-dimensional and multi-modal settings. We also highlight AF potential for sampling the least favorable distributions.
