A Tutorial on Deep Latent Variable Models of Natural Language
Yoon Kim, Sam Wiseman, Alexander M. Rush
TL;DR
<3-5 sentence high-level summary>This survey/presentation addresses how to integrate latent variable modeling with deep neural architectures for natural language, using variational inference as the core learning framework. It characterizes three archetypal latent-variable families—discrete, continuous, and structured discrete—and discusses how to make them “deep” with neural parameterizations. The tutorial covers exact and approximate learning, amortized inference via VAEs, and techniques to tighten ELBO bounds (e.g., flows, IWAE), along with practical issues like posterior collapse and evaluation. Together, these insights enable principled, interpretable, and scalable modeling of multimodal linguistic phenomena with latent structure.
Abstract
There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning. Latent variable modeling makes it easy to explicitly specify model constraints through conditional independence properties, while deep learning makes it possible to parameterize these conditional likelihoods with powerful function approximators. While these "deep latent variable" models provide a rich, flexible framework for modeling many real-world phenomena, difficulties exist: deep parameterizations of conditional likelihoods usually make posterior inference intractable, and latent variable objectives often complicate backpropagation by introducing points of non-differentiability. This tutorial explores these issues in depth through the lens of variational inference.
