Table of Contents
Fetching ...

Predictive variational inference for flexible regression models

Lucas Kock, Scott A. Sisson, G. S. Rodrigues, David J. Nott

TL;DR

This work builds on an existing predictive variational inference (PVI) framework that improves prediction, but also diagnoses model deficiencies through implicit model expansion, and improves the interpretability of existing PVI methods as a diagnostic tool in models where the sampling density depends on the parameters through a linear predictor.

Abstract

A conventional Bayesian approach to prediction uses the posterior distribution to integrate out parameters in a density for unobserved data conditional on the observed data and parameters. When the true posterior is intractable, it is replaced by an approximation; here we focus on variational approximations. Recent work has explored methods that learn posteriors optimized for predictive accuracy under a chosen scoring rule, while regularizing toward the prior or conventional posterior. Our work builds on an existing predictive variational inference (PVI) framework that improves prediction, but also diagnoses model deficiencies through implicit model expansion. In models where the sampling density depends on the parameters through a linear predictor, we improve the interpretability of existing PVI methods as a diagnostic tool. This is achieved by adopting PVI posteriors of Gaussian mixture form (GM-PVI) and establishing connections with plug-in prediction for mixture-of-experts models. We make three main contributions. First, we show that GM-PVI prediction is equivalent to plug-in prediction for certain mixture-of-experts models with covariate-independent weights in generalized linear models and hierarchical extensions of them. Second, we extend standard PVI by allowing GM-PVI posteriors to vary with the prediction covariate and in this case an equivalence to plug-in prediction for mixtures of experts with covariate-dependent weights is established. Third, we demonstrate the diagnostic value of this approach across several examples, including generalized linear models, linear mixed models, and latent Gaussian process models, demonstrating how the parameters of the original model must vary across the covariate space to achieve improvements in prediction.

Predictive variational inference for flexible regression models

TL;DR

This work builds on an existing predictive variational inference (PVI) framework that improves prediction, but also diagnoses model deficiencies through implicit model expansion, and improves the interpretability of existing PVI methods as a diagnostic tool in models where the sampling density depends on the parameters through a linear predictor.

Abstract

A conventional Bayesian approach to prediction uses the posterior distribution to integrate out parameters in a density for unobserved data conditional on the observed data and parameters. When the true posterior is intractable, it is replaced by an approximation; here we focus on variational approximations. Recent work has explored methods that learn posteriors optimized for predictive accuracy under a chosen scoring rule, while regularizing toward the prior or conventional posterior. Our work builds on an existing predictive variational inference (PVI) framework that improves prediction, but also diagnoses model deficiencies through implicit model expansion. In models where the sampling density depends on the parameters through a linear predictor, we improve the interpretability of existing PVI methods as a diagnostic tool. This is achieved by adopting PVI posteriors of Gaussian mixture form (GM-PVI) and establishing connections with plug-in prediction for mixture-of-experts models. We make three main contributions. First, we show that GM-PVI prediction is equivalent to plug-in prediction for certain mixture-of-experts models with covariate-independent weights in generalized linear models and hierarchical extensions of them. Second, we extend standard PVI by allowing GM-PVI posteriors to vary with the prediction covariate and in this case an equivalence to plug-in prediction for mixtures of experts with covariate-dependent weights is established. Third, we demonstrate the diagnostic value of this approach across several examples, including generalized linear models, linear mixed models, and latent Gaussian process models, demonstrating how the parameters of the original model must vary across the covariate space to achieve improvements in prediction.
Paper Structure (25 sections, 51 equations, 6 figures, 2 tables)

This paper contains 25 sections, 51 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Simulation -- Logistic regression. Posterior predictive distribution for different for A)$\beta=0.01$, B)$\beta=0.5$, C)$\beta=100$, and D) the true posterior. The observed data is given as a scatter plot, where triangles denote $y=0$ and circles $y=1$. E) plots the llpd on the hold-out test data versus different values of $\beta$.
  • Figure 2: AIDS case counts. Panel A) shows the mean (bold) and a 95% credible interval (shaded) derived by VGM-PVI (red) and the true posterior (blue) as well as the observed data (points). Panel B) shows the deseasonalized temporal trend for both methods.
  • Figure 3: Temperature data. Water temperature over time at different depths (indicated by gray shades). Shown is A) the observed data, as well as the estimated mean functions $f(t)+b_j$ for the different depths under B) VGM-PVI and C) the conventional posterior. B) also shows the dominating mixture component under VGM-PVI at each point in time (colour bar). Different clusters are indicated by colour.
  • Figure 4: Lidar data. Posterior predictive distributions for A)$\beta=0.01$, B)$\beta=0.5$, C)$\beta=1.0$, and D)$\beta=100$. Gray lines correspond to quantiles at levels $\alpha=0.01,0.05,0.25,0.5, 0.75, 0.95, 0.99$, and the points indicate the observed data.
  • Figure 5: Simulation -- Linear regression. Panel A) shows the mean (bold) and a 95% credible interval (dashed) derived by VGM-PVI under $\beta=0.01$ (red) and the true posterior (blue). Panel B) shows the behaviour of the weights $\omega_k(x)$ under VGM-PVI. Different weights are differentiated by line-style.
  • ...and 1 more figures