Table of Contents
Fetching ...

EigenVI: score-based variational inference with orthogonal function expansions

Diana Cai, Chirag Modi, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul

TL;DR

EigenVI, an eigenvalue-based approach for black-box variational inference (BBVI), is developed and used to approximate a variety of target distributions, including a benchmark suite of Bayesian models from posteriordb.

Abstract

We develop EigenVI, an eigenvalue-based approach for black-box variational inference (BBVI). EigenVI constructs its variational approximations from orthogonal function expansions. For distributions over $\mathbb{R}^D$, the lowest order term in these expansions provides a Gaussian variational approximation, while higher-order terms provide a systematic way to model non-Gaussianity. These approximations are flexible enough to model complex distributions (multimodal, asymmetric), but they are simple enough that one can calculate their low-order moments and draw samples from them. EigenVI can also model other types of random variables (e.g., nonnegative, bounded) by constructing variational approximations from different families of orthogonal functions. Within these families, EigenVI computes the variational approximation that best matches the score function of the target distribution by minimizing a stochastic estimate of the Fisher divergence. Notably, this optimization reduces to solving a minimum eigenvalue problem, so that EigenVI effectively sidesteps the iterative gradient-based optimizations that are required for many other BBVI algorithms. (Gradient-based methods can be sensitive to learning rates, termination criteria, and other tunable hyperparameters.) We use EigenVI to approximate a variety of target distributions, including a benchmark suite of Bayesian models from posteriordb. On these distributions, we find that EigenVI is more accurate than existing methods for Gaussian BBVI.

EigenVI: score-based variational inference with orthogonal function expansions

TL;DR

EigenVI, an eigenvalue-based approach for black-box variational inference (BBVI), is developed and used to approximate a variety of target distributions, including a benchmark suite of Bayesian models from posteriordb.

Abstract

We develop EigenVI, an eigenvalue-based approach for black-box variational inference (BBVI). EigenVI constructs its variational approximations from orthogonal function expansions. For distributions over , the lowest order term in these expansions provides a Gaussian variational approximation, while higher-order terms provide a systematic way to model non-Gaussianity. These approximations are flexible enough to model complex distributions (multimodal, asymmetric), but they are simple enough that one can calculate their low-order moments and draw samples from them. EigenVI can also model other types of random variables (e.g., nonnegative, bounded) by constructing variational approximations from different families of orthogonal functions. Within these families, EigenVI computes the variational approximation that best matches the score function of the target distribution by minimizing a stochastic estimate of the Fisher divergence. Notably, this optimization reduces to solving a minimum eigenvalue problem, so that EigenVI effectively sidesteps the iterative gradient-based optimizations that are required for many other BBVI algorithms. (Gradient-based methods can be sensitive to learning rates, termination criteria, and other tunable hyperparameters.) We use EigenVI to approximate a variety of target distributions, including a benchmark suite of Bayesian models from posteriordb. On these distributions, we find that EigenVI is more accurate than existing methods for Gaussian BBVI.

Paper Structure

This paper contains 25 sections, 1 theorem, 57 equations, 9 figures, 2 tables.

Key Result

Lemma C.1

Let $\{\phi_k(z)\}_{k=1}^\infty$ be an orthogonal function expansion, and let $q\in\mathcal{Q}_K$ be the variational approximation parameterized by where the weights satisfy $\sum_{k=1}^K \alpha_k^2=1$, thus ensuring that the distribution is normalized. Suppose furthermore that $q$ is chosen to minimize the empirical estimate of the Fisher divergence given, as in eq. (eq-empirical-divergence), by

Figures (9)

  • Figure 1: Target probability distributions (black dashed curves) on the interval $[-1,1]$ (left), the unit circle (middle), and the real line (right), and their approximations by orthogonal function expansions from different families and of different orders; see \ref{['eq:OF-1']} and \ref{['tab:onedim']}.
  • Figure 2: Higher-order expansions may be required to approximate target distributions (black) that are not standardized. Left: approximation of a non-standardized Gaussian. Right: approximation of the mixture distribution in \ref{['fig:onedim']} after translating its largest modes away from the origin.
  • Figure 3: 2D target functions (column 1): a 3-component Gaussian mixture distribution (row 1), a funnel distribution (row 2), and a cross distribution (row 3). We report the $\text{KL}(p;q)$ for the resulting optimal variational distributions obtained using score-based VI with a Gaussian variational family (column 2) and the EigenVI variational family (columns 3--5), where $K\!=\!K_1K_2$.
  • Figure 4: Sinh-arcsinh normal distribution synthetic target. Panel (a) shows the three targets we consider in 2D, and their resulting EigenVI fit. Panel (b) shows measures $\text{KL}(p;q)$ for $D=2$, and panel (c) shows $\text{KL}(p;q)$ for $D=5$; the $x$-axis shows the number of basis functions, $K\!=\!\prod_d K_d$.
  • Figure 5: Results on posteriordb models. Top three rows: marginal distributions of the even dimensions from 8-schools. Reference samples from HMC are outlined in gray, and the VI samples are in green. Bottom two rows: evaluation of methods with the (forward) Fisher divergence. The $x$-axis shows the number of basis functions, $K\!=\!\prod_{d} K_d$. Shaded regions represent standard errors computed with respect to 5 random seeds.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Lemma C.1
  • proof