Table of Contents
Fetching ...

Vector Copula Variational Inference and Dependent Block Posterior Approximations

Yu Fu, Michael Stanley Smith, Anastasios Panagiotelis

TL;DR

This paper addresses the accuracy limitations of traditional VI that assumes block independence by introducing VCVI, a framework that uses vector copulas to tie together heterogeneous, learnable marginals across blocks. It defines two learnable, scalable families—Gaussian vector copulas (GVC) and Kendall vector copulas (KVC)—and supports two marginal transformation schemes (M1 and M2) implemented as normalizing flows to enable efficient stochastic gradient optimization. Across four example models and 16 datasets, VCVI consistently delivers more accurate posterior approximations than block-independent or factor-based methods, with only modest additional computation and a practical Python package available. The approach is highly modular, allowing users to tailor marginals and between-block dependence to the posterior structure, making it applicable to large-scale econometric and statistical models.

Abstract

The key to VI is the selection of a tractable density to approximate the Bayesian posterior. For large and complex models a common choice is to assume independence between multivariate blocks in a partition of the parameter space. While this simplifies the problem it can reduce accuracy. This paper proposes using vector copulas to capture dependence between the blocks parsimoniously. Tailored multivariate marginals are constructed using learnable transport maps. We call the resulting joint distribution a ``dependent block posterior'' approximation. Vector copula models are suggested that make tractable and flexible variational approximations. They allow for differing marginals, numbers of blocks, block sizes and forms of between block dependence. They also allow for solution of the variational optimization using efficient stochastic gradient methods. The approach is demonstrated using four different statistical models and 16 datasets which have posteriors that are challenging to approximate. This includes models that use global-local shrinkage priors for regularization, and hierarchical models for smoothing and heteroscedastic time series. In all cases, our method produces more accurate posterior approximations than benchmark VI methods that either assume block independence or factor-based dependence, at limited additional computational cost. A python package implementing the method is available on GitHub at https://github.com/YuFuOliver/VCVI_Rep_PyPackage.

Vector Copula Variational Inference and Dependent Block Posterior Approximations

TL;DR

This paper addresses the accuracy limitations of traditional VI that assumes block independence by introducing VCVI, a framework that uses vector copulas to tie together heterogeneous, learnable marginals across blocks. It defines two learnable, scalable families—Gaussian vector copulas (GVC) and Kendall vector copulas (KVC)—and supports two marginal transformation schemes (M1 and M2) implemented as normalizing flows to enable efficient stochastic gradient optimization. Across four example models and 16 datasets, VCVI consistently delivers more accurate posterior approximations than block-independent or factor-based methods, with only modest additional computation and a practical Python package available. The approach is highly modular, allowing users to tailor marginals and between-block dependence to the posterior structure, making it applicable to large-scale econometric and statistical models.

Abstract

The key to VI is the selection of a tractable density to approximate the Bayesian posterior. For large and complex models a common choice is to assume independence between multivariate blocks in a partition of the parameter space. While this simplifies the problem it can reduce accuracy. This paper proposes using vector copulas to capture dependence between the blocks parsimoniously. Tailored multivariate marginals are constructed using learnable transport maps. We call the resulting joint distribution a ``dependent block posterior'' approximation. Vector copula models are suggested that make tractable and flexible variational approximations. They allow for differing marginals, numbers of blocks, block sizes and forms of between block dependence. They also allow for solution of the variational optimization using efficient stochastic gradient methods. The approach is demonstrated using four different statistical models and 16 datasets which have posteriors that are challenging to approximate. This includes models that use global-local shrinkage priors for regularization, and hierarchical models for smoothing and heteroscedastic time series. In all cases, our method produces more accurate posterior approximations than benchmark VI methods that either assume block independence or factor-based dependence, at limited additional computational cost. A python package implementing the method is available on GitHub at https://github.com/YuFuOliver/VCVI_Rep_PyPackage.

Paper Structure

This paper contains 20 sections, 62 equations, 5 figures, 5 tables, 2 algorithms.

Figures (5)

  • Figure 1: Posterior summaries for krkp (top row) and spam (bottom row). Panels (a,c) are heatmaps of the matrix of Spearman correlations for $(\bm{\alpha}^\top, \tilde{\bm{\delta}}^\top)$ computed exactly using (slow) MCMC. Panels (b,d) are scatterplots of $\{\hbox{corr}(\alpha_i,\tilde{\delta}_i); i=1,\ldots,m\}$, with exact posterior values on the horizontal axis, and variational approximations on the vertical axis for GC-F5 (black crosses) and A4 (orange dots).
  • Figure 2: Plot of the ELBO values for the dataset qsar ($n=8992, m=1024$) for GC-F5 (black line) and A4 (orange line) against (a) SGD step number, (b) wall clock time.
  • Figure 3: Comparison of two approximations GC-F5 (black) and A4 (orange) for the regularized correlation matrix example with $r=10$. Panel (a) plots the ELBO values against SGD step. Panel (b) plot the exact posterior mean of $\bm{\beta}$ computed using MCMC (horizontal axis) against the variational posterior means (vertical axis). Greater alignment on the 45 degree line corresponds to higher accuracy.
  • Figure 4: Comparison of the $\bm{\mu}$ and $\bm{\zeta}$ from the UCSV U.S. inflation example. Posterior means from different methods are plotted. These are provided for the VAs GC-F5 (blue dotted) and GVC-F20 (red solid), along with the exact posterior computed using MCMC (black dashed). The shaded area represents the 90% interval derived from the MCMC draws.
  • Figure 5: Posterior mean estimates of $g_1$, $g_2$ and $g_3$. The true functions (grey dotted line), exact posterior means computed using MCMC (black dashed line) and variational means from GVC-F20 (red solid line) are plotted. The MCMC and GVC-F20 estimates are almost identical and difficult to distinguish visually. Because the functions are not unique up to an intercept in the additive model, we plot the centered functions $g_l(x) - g_l(0.5)$.

Theorems & Definitions (1)

  • proof