Table of Contents
Fetching ...

Fast Gradient-Based Inference with Continuous Latent Variable Models in Auxiliary Form

Diederik P Kingma

TL;DR

The paper tackles the challenge of efficient gradient-based inference in Bayesian networks with many continuous latent layers, where exact inference is intractable. It introduces an auxiliary-form transformation that replaces latent variables with conditionally deterministic counterparts and auxiliary variables, yielding larger Markov blankets and faster gradient updates. The method is shown to be equivalent to the original model after marginalization and is implemented with practical steps, including generating functions and inversion-based sampling schemes. Empirical results on MNIST-based generative networks and dynamical Bayesian networks demonstrate substantial speedups in MAP inference, validating the approach and its potential for scalable gradient-based learning in deep latent structures.

Abstract

We propose a technique for increasing the efficiency of gradient-based inference and learning in Bayesian networks with multiple layers of continuous latent vari- ables. We show that, in many cases, it is possible to express such models in an auxiliary form, where continuous latent variables are conditionally deterministic given their parents and a set of independent auxiliary variables. Variables of mod- els in this auxiliary form have much larger Markov blankets, leading to significant speedups in gradient-based inference, e.g. rapid mixing Hybrid Monte Carlo and efficient gradient-based optimization. The relative efficiency is confirmed in ex- periments.

Fast Gradient-Based Inference with Continuous Latent Variable Models in Auxiliary Form

TL;DR

The paper tackles the challenge of efficient gradient-based inference in Bayesian networks with many continuous latent layers, where exact inference is intractable. It introduces an auxiliary-form transformation that replaces latent variables with conditionally deterministic counterparts and auxiliary variables, yielding larger Markov blankets and faster gradient updates. The method is shown to be equivalent to the original model after marginalization and is implemented with practical steps, including generating functions and inversion-based sampling schemes. Empirical results on MNIST-based generative networks and dynamical Bayesian networks demonstrate substantial speedups in MAP inference, validating the approach and its potential for scalable gradient-based learning in deep latent structures.

Abstract

We propose a technique for increasing the efficiency of gradient-based inference and learning in Bayesian networks with multiple layers of continuous latent vari- ables. We show that, in many cases, it is possible to express such models in an auxiliary form, where continuous latent variables are conditionally deterministic given their parents and a set of independent auxiliary variables. Variables of mod- els in this auxiliary form have much larger Markov blankets, leading to significant speedups in gradient-based inference, e.g. rapid mixing Hybrid Monte Carlo and efficient gradient-based optimization. The relative efficiency is confirmed in ex- periments.

Paper Structure

This paper contains 24 sections, 14 equations, 3 figures.

Figures (3)

  • Figure 1: (a) A continuous latent variable $Z_j$ with parents $\mathbf{Pa}_j$ and a conditional distribution $p_{\boldsymbol{\theta}}(Z_j|\mathbf{Pa}_j)$. (b) The auxiliary form where we replaced each $Z_j$ with $\widetilde{Z}_j$ (with parents $\widetilde{\mathbf{Pa}}_j$, where $\widetilde{Z} = g_Z(\widetilde{\mathbf{Pa}}_j, E_j, \boldsymbol{\theta})$, with auxiliary latent variable $E_j \sim p_{\boldsymbol{\theta}}(E_j)$, such that $Z_j|\mathbf{Pa}_j$ equals $\widetilde{Z}_j|\widetilde{\mathbf{Pa}}_j$ in distribution. The diamond indicates a conditionally deterministic variable: the value of $\widetilde{Z}_j$ is only deterministic when conditioned on both $\widetilde{\mathbf{Pa}}_j$ and $E_j$.
  • Figure 2: (a) A basic illustrative Bayesian network with three continuous latent variables and three observed variables, representing $p_{\boldsymbol{\theta}}(X_1, X_2, X_3, Z_1, Z_2, Z_3)$. (b) The auxiliary form with conditionally deterministic variables $\widetilde{Z}_1$, $\widetilde{Z}_2$ and $\widetilde{Z}_3$, chosen such that $\widetilde{Z}_1 = g_1(E_1, \boldsymbol{\theta})$, $\widetilde{Z}_2 = g_2(Z_1, E_2, \boldsymbol{\theta})$ and $\widetilde{Z}_3 = g_3(\widetilde{Z}_2, E_3, \boldsymbol{\theta})$, with auxiliary latent variables $E_2 \sim p_{\boldsymbol{\theta}}(E_2)$ and $E_3 \sim p_{\boldsymbol{\theta}}(E_3)$.
  • Figure 3: Left: convergence of log-likelihood for generative MNIST problem. Right: Convergence of log-likelihood for the dynamic Bayesian network (DBN).