Table of Contents
Fetching ...

A Tutorial on Deep Latent Variable Models of Natural Language

Yoon Kim, Sam Wiseman, Alexander M. Rush

TL;DR

<3-5 sentence high-level summary>This survey/presentation addresses how to integrate latent variable modeling with deep neural architectures for natural language, using variational inference as the core learning framework. It characterizes three archetypal latent-variable families—discrete, continuous, and structured discrete—and discusses how to make them “deep” with neural parameterizations. The tutorial covers exact and approximate learning, amortized inference via VAEs, and techniques to tighten ELBO bounds (e.g., flows, IWAE), along with practical issues like posterior collapse and evaluation. Together, these insights enable principled, interpretable, and scalable modeling of multimodal linguistic phenomena with latent structure.

Abstract

There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning. Latent variable modeling makes it easy to explicitly specify model constraints through conditional independence properties, while deep learning makes it possible to parameterize these conditional likelihoods with powerful function approximators. While these "deep latent variable" models provide a rich, flexible framework for modeling many real-world phenomena, difficulties exist: deep parameterizations of conditional likelihoods usually make posterior inference intractable, and latent variable objectives often complicate backpropagation by introducing points of non-differentiability. This tutorial explores these issues in depth through the lens of variational inference.

A Tutorial on Deep Latent Variable Models of Natural Language

TL;DR

<3-5 sentence high-level summary>This survey/presentation addresses how to integrate latent variable modeling with deep neural architectures for natural language, using variational inference as the core learning framework. It characterizes three archetypal latent-variable families—discrete, continuous, and structured discrete—and discusses how to make them “deep” with neural parameterizations. The tutorial covers exact and approximate learning, amortized inference via VAEs, and techniques to tighten ELBO bounds (e.g., flows, IWAE), along with practical issues like posterior collapse and evaluation. Together, these insights enable principled, interpretable, and scalable modeling of multimodal linguistic phenomena with latent structure.

Abstract

There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning. Latent variable modeling makes it easy to explicitly specify model constraints through conditional independence properties, while deep learning makes it possible to parameterize these conditional likelihoods with powerful function approximators. While these "deep latent variable" models provide a rich, flexible framework for modeling many real-world phenomena, difficulties exist: deep parameterizations of conditional likelihoods usually make posterior inference intractable, and latent variable objectives often complicate backpropagation by introducing points of non-differentiability. This tutorial explores these issues in depth through the lens of variational inference.

Paper Structure

This paper contains 54 sections, 88 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Graphical model corresponding to an RNN language model.
  • Figure 2: Naive Bayes graphical model. For simplicity, all sequences are depicted as having $T$ tokens. All distributions are categorical, and the parameters are $\boldsymbol{\mu} \in \Delta^{K-1}$ and $\pi = \{\boldsymbol{\pi}_k \in \Delta^{V-1}\}_{k=1}^K$.
  • Figure 3: Graphical model representation of a categorical latent variable model with tokens generated by an RNN. For simplicity, all sequences are depicted as having $T$ tokens. The $z^{(n)}$s are drawn from a Categorical distribution with parameter $\boldsymbol{\mu}$, while $x^{(n)}$ is drawn from an $\mathop{\mathrm{RNNLM}}\limits(x ; \, \pi_{z^{(n)}})$. These $\mathop{\mathrm{RNNLM}}\limits$s have parameters $\pi = \{\pi_k\}_{k=1}^K$.
  • Figure 4: Continuous Naive Bayes model. The $\mathbf{z}^{(n)}$ have a normal distribution $\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{I})$, and each token $x^{(n)}_t$ has a Categorical distribution with parameter $\mathop{\mathrm{softmax}}\limits(\boldsymbol{W} \mathbf{z}^{(n)})$. (For consistency with the previous models, we let $\pi = \{\boldsymbol{W}\}$). Note that the dependence structure is identical to that in Figure \ref{['fig:nbgm']}; the only difference is the type of latent variable and the parameterizations.
  • Figure 5: Continuous latent variable model with tokens generated by an RNN. The $\mathbf{z}^{(n)}$ have a normal distribution $\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{I})$, and the $x^{(n)}$ have a $\mathop{\mathrm{CRNNLM}}\limits(x^{(n)}_{1:T} ; \, \pi, \mathbf{z}^{(n)})$ distribution, where $\pi$ contains the parameters of the $\mathop{\mathrm{CRNNLM}}\limits$. Note that the dependence structure is identical to that in Figure \ref{['fig:catrnngm']}; the only difference is the type of latent variable and the parameterizations.
  • ...and 4 more figures