Table of Contents
Fetching ...

Stable Training of Normalizing Flows for High-dimensional Variational Inference

Daniel Andrade

TL;DR

This work tackles the instability of training deep Real NVPs for high-dimensional variational inference. It introduces two stabilizers—asymmetric soft clipping of the coupling scales and the LOFT bijective soft-log transform—to curb extreme samples and reduce gradient variance, enabling stable training in thousands of dimensions. Through extensive experiments on diverse targets, including a Horseshoe-prior logistic regression, the approach yields improved ELBOs and substantially sharper marginal likelihood estimates via importance sampling, outperforming SMC and competing with HMC in posterior quality. The findings offer practical guidance on base distributions, depth, gradient estimation, and training protocols for effective high-dimensional normalizing-flow VI. The work also suggests hybrid NF–MCMC strategies as a promising future direction for scalable Bayesian inference.

Abstract

Variational inference with normalizing flows (NFs) is an increasingly popular alternative to MCMC methods. In particular, NFs based on coupling layers (Real NVPs) are frequently used due to their good empirical performance. In theory, increasing the depth of normalizing flows should lead to more accurate posterior approximations. However, in practice, training deep normalizing flows for approximating high-dimensional posterior distributions is often infeasible due to the high variance of the stochastic gradients. In this work, we show that previous methods for stabilizing the variance of stochastic gradient descent can be insufficient to achieve stable training of Real NVPs. As the source of the problem, we identify that, during training, samples often exhibit unusual high values. As a remedy, we propose a combination of two methods: (1) soft-thresholding of the scale in Real NVPs, and (2) a bijective soft log transformation of the samples. We evaluate these and other previously proposed modification on several challenging target distributions, including a high-dimensional horseshoe logistic regression model. Our experiments show that with our modifications, stable training of Real NVPs for posteriors with several thousand dimensions is possible, allowing for more accurate marginal likelihood estimation via importance sampling. Moreover, we evaluate several common training techniques and architecture choices and provide practical advise for training NFs for high-dimensional variational inference.

Stable Training of Normalizing Flows for High-dimensional Variational Inference

TL;DR

This work tackles the instability of training deep Real NVPs for high-dimensional variational inference. It introduces two stabilizers—asymmetric soft clipping of the coupling scales and the LOFT bijective soft-log transform—to curb extreme samples and reduce gradient variance, enabling stable training in thousands of dimensions. Through extensive experiments on diverse targets, including a Horseshoe-prior logistic regression, the approach yields improved ELBOs and substantially sharper marginal likelihood estimates via importance sampling, outperforming SMC and competing with HMC in posterior quality. The findings offer practical guidance on base distributions, depth, gradient estimation, and training protocols for effective high-dimensional normalizing-flow VI. The work also suggests hybrid NF–MCMC strategies as a promising future direction for scalable Bayesian inference.

Abstract

Variational inference with normalizing flows (NFs) is an increasingly popular alternative to MCMC methods. In particular, NFs based on coupling layers (Real NVPs) are frequently used due to their good empirical performance. In theory, increasing the depth of normalizing flows should lead to more accurate posterior approximations. However, in practice, training deep normalizing flows for approximating high-dimensional posterior distributions is often infeasible due to the high variance of the stochastic gradients. In this work, we show that previous methods for stabilizing the variance of stochastic gradient descent can be insufficient to achieve stable training of Real NVPs. As the source of the problem, we identify that, during training, samples often exhibit unusual high values. As a remedy, we propose a combination of two methods: (1) soft-thresholding of the scale in Real NVPs, and (2) a bijective soft log transformation of the samples. We evaluate these and other previously proposed modification on several challenging target distributions, including a high-dimensional horseshoe logistic regression model. Our experiments show that with our modifications, stable training of Real NVPs for posteriors with several thousand dimensions is possible, allowing for more accurate marginal likelihood estimation via importance sampling. Moreover, we evaluate several common training techniques and architecture choices and provide practical advise for training NFs for high-dimensional variational inference.
Paper Structure (35 sections, 38 equations, 10 figures, 11 tables)

This paper contains 35 sections, 38 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Shows the asymmetric soft clamp function $c(s)$ from Equation \ref{['eq:asym_soft_clamp']} with $\alpha_{neg} = 2$ and $\alpha_{pos} = 0.1$.
  • Figure 2: Shows the LOFT function $g(z)$ from Equation \ref{['eq:LOFT']} with $\tau = 2$.
  • Figure 3: Shows the ELBO and the log marginal likelihood estimate using importance sampling ($d = 1000$). Red line shows true log marginal likelihood.
  • Figure 4: Shows the ELBO and the log marginal likelihood estimate using importance sampling ($d = 1000$, for the Conjugate Linear Regression model $d = 1001$). Red line shows true log marginal likelihood.
  • Figure 5: Plot of ELBO vs error in marginal likelihood of 13 different variational inference methods/models for each model; also shows 95% confidence level of pearson correlation $\rho$ estimated with bootstrapping.
  • ...and 5 more figures