Empirical evaluation of normalizing flows in Markov Chain Monte Carlo
David Nabergoj, Erik Štrumbelj
TL;DR
The paper addresses the lack of systematic guidance for selecting normalizing-flow architectures in NF-based MCMC (NFMC) by conducting an extensive empirical evaluation across autoregressive, residual, and continuous NF families, paired with NeuTra preconditioning and Jump MCMC strategies on synthetic and real-world target distributions. It demonstrates that, with appropriate hyperparameters, NFMC can outperform classical MCMC, with Jump HMC consistently delivering strong performance, and that independent NF jumps are a key driver of gains over mere preconditioning. The study identifies practical architectural recommendations, such as favoring i-ResNet or CNFRK(R) for gradient-based scenarios and coupling or IA-based NFs for high-dimensional or multimodal targets, while highlighting stability considerations and the importance of tuning. By providing dataset and code access, it offers a solid foundation for practitioners to adopt NFMC methods and for researchers to refine architecture choices and develop new NFMC samplers with informed biases. The findings contribute actionable guidelines and a comparative benchmark that clarifies when NFMC is advantageous and how to select architectures to maximize sampling efficiency and moment accuracy.
Abstract
Recent advances in MCMC use normalizing flows to precondition target distributions and enable jumps to distant regions. However, there is currently no systematic comparison of different normalizing flow architectures for MCMC. As such, many works choose simple flow architectures that are readily available and do not consider other models. Guidelines for choosing an appropriate architecture would reduce analysis time for practitioners and motivate researchers to take the recommended models as foundations to be improved. We provide the first such guideline by extensively evaluating many normalizing flow architectures on various flow-based MCMC methods and target distributions. When the target density gradient is available, we show that flow-based MCMC outperforms classic MCMC for suitable NF architecture choices with minor hyperparameter tuning. When the gradient is unavailable, flow-based MCMC wins with off-the-shelf architectures. We find contractive residual flows to be the best general-purpose models with relatively low sensitivity to hyperparameter choice. We also provide various insights into normalizing flow behavior within MCMC when varying their hyperparameters, properties of target distributions, and the overall computational budget.
