A Recurrent Latent Variable Model for Sequential Data
Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio
TL;DR
The paper addresses modelling highly structured sequential data by introducing a variational recurrent neural network (VRNN) that embeds latent random variables at each timestep. Each latent is drawn from a prior conditioned on the previous hidden state, while the generation and inference networks are conditioned on the latent and the RNN state, respectively, enabling a joint temporal Bayesian treatment with a variational objective. Empirically, VRNNs (including Gaussian and Gaussian mixture observation models) achieve higher log-likelihoods than strong RNN baselines on speech and handwriting tasks, with the temporal prior improving performance and samples showing reduced noise and more consistent handwriting style. This approach offers a principled way to capture multimodal, temporally coherent variability in sequential data, with potential impact on speech synthesis and other structured sequence domains.
Abstract
In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech. We empirically evaluate the proposed model against related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamic hidden state.
