Differentiable Annealed Importance Sampling Minimizes The Symmetrized Kullback-Leibler Divergence Between Initial and Target Distribution
Johannes Zenn, Robert Bamler
TL;DR
This work analyzes differentiable annealed importance sampling (DAIS) and proves that, with many annealing steps, DAIS minimizes the symmetrized KL divergence between the learnable initial distribution $q_0$ and the target distribution $f/Z$, offering a variational interpretation of DAIS. By introducing DAIS$_0$, the authors treat $q_0$ as an explicit, compact approximate posterior that can be used directly for inference, avoiding the computational burden of full AIS at test time. Empirically, DAIS$_0$ often yields uncertainty estimates that are more accurate than those from reverse-KL VI, IWVI, and MSC, especially in higher-dimensional settings, while maintaining a tractable and interpretable representation. The findings bridge AIS, VI, and MCMC-based methods, highlighting practical benefits for Gaussian process regression and Bayesian logistic regression, and illustrating the trade-offs between mode-covering and mass-covering behavior in variational approximations.
Abstract
Differentiable annealed importance sampling (DAIS), proposed by Geffner & Domke (2021) and Zhang et al. (2021), allows optimizing over the initial distribution of AIS. In this paper, we show that, in the limit of many transitions, DAIS minimizes the symmetrized Kullback-Leibler divergence between the initial and target distribution. Thus, DAIS can be seen as a form of variational inference (VI) as its initial distribution is a parametric fit to an intractable target distribution. We empirically evaluate the usefulness of the initial distribution as a variational distribution on synthetic and real-world data, observing that it often provides more accurate uncertainty estimates than VI (optimizing the reverse KL divergence), importance weighted VI, and Markovian score climbing (optimizing the forward KL divergence).
