Table of Contents
Fetching ...

VISA: Variational Inference with Sequential Sample-Average Approximations

Heiko Zimmermann, Christian A. Naesseth, Jan-Willem van de Meent

TL;DR

VISA introduces sequential sample-average approximations within a trust-region framework to accelerate variational inference for expensive, non-differentiable simulation-based models by reusing model evaluations. It targets the forward KL objective and uses fixed samples from a proposal to form a deterministic surrogate, refreshing the surrogate when the ESS drops below a threshold. Across high-dimensional Gaussians, Lotka–Volterra dynamics, and a Pickover attractor, VISA achieves comparable posterior quality to IWFVI while reducing the number of model evaluations by roughly a factor of two with conservative learning rates. The method trades some risk of posterior-variance underestimation for substantial computational savings, and its efficacy depends on careful choice of the ESS threshold and learning rate. VISA is particularly suited for models where simulator evaluations dominate cost and differentiation is unavailable or impractical.

Abstract

We present variational inference with sequential sample-average approximation (VISA), a method for approximate inference in computationally intensive models, such as those based on numerical simulations. VISA extends importance-weighted forward-KL variational inference by employing a sequence of sample-average approximations, which are considered valid inside a trust region. This makes it possible to reuse model evaluations across multiple gradient steps, thereby reducing computational cost. We perform experiments on high-dimensional Gaussians, Lotka-Volterra dynamics, and a Pickover attractor, which demonstrate that VISA can achieve comparable approximation accuracy to standard importance-weighted forward-KL variational inference with computational savings of a factor two or more for conservatively chosen learning rates.

VISA: Variational Inference with Sequential Sample-Average Approximations

TL;DR

VISA introduces sequential sample-average approximations within a trust-region framework to accelerate variational inference for expensive, non-differentiable simulation-based models by reusing model evaluations. It targets the forward KL objective and uses fixed samples from a proposal to form a deterministic surrogate, refreshing the surrogate when the ESS drops below a threshold. Across high-dimensional Gaussians, Lotka–Volterra dynamics, and a Pickover attractor, VISA achieves comparable posterior quality to IWFVI while reducing the number of model evaluations by roughly a factor of two with conservative learning rates. The method trades some risk of posterior-variance underestimation for substantial computational savings, and its efficacy depends on careful choice of the ESS threshold and learning rate. VISA is particularly suited for models where simulator evaluations dominate cost and differentiation is unavailable or impractical.

Abstract

We present variational inference with sequential sample-average approximation (VISA), a method for approximate inference in computationally intensive models, such as those based on numerical simulations. VISA extends importance-weighted forward-KL variational inference by employing a sequence of sample-average approximations, which are considered valid inside a trust region. This makes it possible to reuse model evaluations across multiple gradient steps, thereby reducing computational cost. We perform experiments on high-dimensional Gaussians, Lotka-Volterra dynamics, and a Pickover attractor, which demonstrate that VISA can achieve comparable approximation accuracy to standard importance-weighted forward-KL variational inference with computational savings of a factor two or more for conservatively chosen learning rates.
Paper Structure (21 sections, 30 equations, 4 figures, 1 algorithm)

This paper contains 21 sections, 30 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Visualization of parameter traces and trust regions corresponding to different SAAs. If after an update $\phi \notin S_{\mathcal{Z},\alpha({\tilde{\phi}})}$, we set ${\tilde{\phi}} \leftarrow \phi$ to construct a new SAA and corresponding trust region.
  • Figure 2: Symmetric KL-divergence as a function of the number of model evaluations for a Gaussian target with diagonal covariance matrix (top row) and dense covariance matrix (bottom row). For small learning rates (0.001, 0.005, 0.01) IWFVI and BBVI-SF, need a larger number of model evaluations to converge. VISA converges much faster as it compensates for the small step size by reusing samples. For a learning rate of 0.05 VISA fails to reliably converge, while IWFVI still converges. Overall, VISA converges faster or at the same rate as IWFVI and BBVI-SF with the same or higher learning rates.
  • Figure 3: Results for Lotka-Volterra model with different learning rates. (Top row) Training objective over number of model evaluations. (Middle row) Approximate forward KL-divergence computed on reference samples obtained by MCMC. For smaller step sizes (0.001, 0.005) VISA achieves comparable forward KL-divergence to IWFVI while requiring significantly less model evaluations to converge (see vertical lines). For larger step sizes (0.01) VISA only converges with a high ess threshold (0.99) for which it requires approximately the same number of evaluations as IWFVI. (Bottom row) Gradient steps over number of batch evaluations of the model, each batch evaluation corresponds to evaluating a batch of $N=100$ samples. VISA requires fewer evaluations per gradient step compared to IWFVI.
  • Figure 4: Results for Pickover attractor. (a) Approximate log-joint density over number of batch-evaluations of model. (b) Log-joint approximation plotted over domain of prior. The variational approximation capture the high density area containing the data. (c) Visualization of pickover attractor with ground truth parameters $\theta=[-2.3, 1.25]$. (d) Visualization of attractor with average system parameters computed over $10.000$ samples from the learned variational approximation. Each evaluation in the plot corresponds to evaluating a batch of $N=10$ samples.