Natural Gradient Hybrid Variational Inference with Application to Deep Mixed Models

Weiben Zhang; Michael Stanley Smith; Worapree Maneesoonthorn; Ruben Loaiza-Maya

Natural Gradient Hybrid Variational Inference with Application to Deep Mixed Models

Weiben Zhang, Michael Stanley Smith, Worapree Maneesoonthorn, Ruben Loaiza-Maya

TL;DR

This paper introduces Natural Gradient Hybrid Variational Inference (NG-HVI), a scalable variational inference framework for stochastic models with both global parameters and high-dimensional latent variables. By combining a hybrid VI scheme with stochastic natural gradient ascent, the method uses a fixed-form Gaussian variational approximation for global parameters and samples latent variables from their conditional posteriors, with a damped Fisher information matrix to stabilize updates. The authors demonstrate substantial gains in convergence speed and predictive accuracy on deep mixed models (DMMs) and a financial asset-pricing setting, outperforming ordinary-gradient hybrid VI and competing natural-gradient VI methods. The work provides practical evidence of NG-HVI’s efficiency in high-dimensional contexts and offers accessible MATLAB code for replication and extension.

Abstract

Stochastic models with global parameters and latent variables are common, and for which variational inference (VI) is popular. However, existing methods are often either slow or inaccurate in high dimensions. We suggest a fast and accurate VI method for this case that employs a well-defined natural gradient variational optimization that targets the joint posterior of the global parameters and latent variables. It is a hybrid method, where at each step the global parameters are updated using the natural gradient and the latent variables are generated from their conditional posterior. A fast to compute expression for the Tikhonov damped Fisher information matrix is used, along with the re-parameterization trick, to provide a stable natural gradient. We apply the approach to deep mixed models, which are an emerging class of Bayesian neural networks with random output layer coefficients to allow for heterogeneity. A range of simulations show that using the natural gradient is substantially more efficient than using the ordinary gradient, and that the approach is faster and more accurate than two cutting-edge natural gradient VI methods. In a financial application we show that accounting for industry level heterogeneity using the deep mixed model improves the accuracy of asset pricing models. MATLAB code to implement the method can be found at: https://github.com/WeibenZhang07/NG-HVI.

Natural Gradient Hybrid Variational Inference with Application to Deep Mixed Models

TL;DR

Abstract

Paper Structure (23 sections, 2 theorems, 42 equations, 10 figures, 6 tables, 3 algorithms)

This paper contains 23 sections, 2 theorems, 42 equations, 10 figures, 6 tables, 3 algorithms.

Introduction
Hybrid Variational Inference
Variational inference
Variational inference for latent variable models
Hybrid variational inference
Natural Gradient Hybrid Variational Inference
Natural gradient ascent
Fixed form approximation
Example 1: Linear regression with random effect
Hybrid Variational Inference for Deep Mixed Models
Deep mixed models
NG-HVI for DMM
Example 2: Gaussian DMM
Example 2(a): Smaller model
Example 2(b): Larger model
...and 8 more sections

Key Result

Theorem 1

Let $q_{\lambda}(\text{\boldmath$\psi$}) = p(\text{\boldmath$z$}|\text{\boldmath$\theta$},\text{\boldmath$y$}) q^0_{\text{\boldmath$\lambda$}}(\text{\boldmath$\theta$})$ and denote the Fisher information matrix of the marginal approximation $q^0_\lambda(\bm{\theta})$ as $F^0(\bm\lambda) = E_{q_\lamb

Figures (10)

Figure 1: Plots of the noisy ELBO function for the linear random effects regression with $\sigma^2_\epsilon = \sigma^2_\alpha = 1$ and $K = 1000$ for Example 1. Panel (a) plots against optimization step number, and panel (b) plots against wall clock time (seconds). The results for DAVI are plotted as a dotted red line, SG-HVI as a dash-dot blue line, and NG-HVI as a solid black line. The average of the noisy ELBO function values over the last 100 steps are also reported.
Figure 2: Simulation results from the Gaussian DMM in Example 2(a). Panel (a) depicts convergence of the noisy ELBO against optimization step number. Panel (b) depicts boxplots of the out-of-sample predictive $R^2$ (displayed as a ratio of those from the HVI methods over those from DAVI) resulting from 100 repeated simulated datasets.
Figure 3: Plots of the noisy ELBO values against optimization step number for the larger Gaussian DMM in Example 2(b).
Figure 4: Simulation results for the Bernoulli DMM in Example 3.
Figure 5: Lek profile depicting the heterogeneous responses implied by the DMM asset pricing model. The first row uses inputs on July 2005 (low market volatility month), the second row uses inputs on May 2012 (median market volatility month) and the last row inputs on October 2008 (extreme market volatility month).
...and 5 more figures

Theorems & Definitions (2)

Theorem 1: Fisher information matrix for hybrid VI
Corollary 1.1: Natural gradient for hybrid VI

Natural Gradient Hybrid Variational Inference with Application to Deep Mixed Models

TL;DR

Abstract

Natural Gradient Hybrid Variational Inference with Application to Deep Mixed Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (2)