Table of Contents
Fetching ...

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin

TL;DR

The stochastic optimal path which solves the classical optimal path problem by a probability-softening solution and gives all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP).

Abstract

We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP). We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. Our implementation code is available at \url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}.

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

TL;DR

The stochastic optimal path which solves the classical optimal path problem by a probability-softening solution and gives all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP).

Abstract

We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP). We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. Our implementation code is available at \url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}.
Paper Structure (51 sections, 11 theorems, 55 equations, 16 figures, 3 tables, 6 algorithms)

This paper contains 51 sections, 11 theorems, 55 equations, 16 figures, 3 tables, 6 algorithms.

Key Result

Lemma 4.2

Let where for all $\mathbf y \in \mathcal{Y}(1,N)$, Then the probability of $Y=\mathbf y$ is given by eqn:gibbsdistribution.

Figures (16)

  • Figure 1: A pipeline of BDP-VAE. BDP-VAE captures the unobserved sparse structural dependency (i.e., optimal paths on a DAG) in the latent space in parallel training the model and allows gradient-based optimization for learning the edge weights $\mathbf{W}$.
  • Figure 2: Toy experiments on BDP to find stochastic optimal paths under randomly generated DAGs. The first row is a 5-node DAG and its density plots with different $\alpha$ value. The second row is an 8-node DAG and its density plots with different $\alpha$ value.
  • Figure 3: Inference F0 trajectory comparison with VAENAR-TTS of utterance "I suppose I have many thoughts.". The intonation of BDPVAE-TTS is close to the GT indicating that sparse optimal paths help the decoder with a better understanding of how phoneme contributes to the overall utterance with approximated durations.
  • Figure 4: Visualization of GT, synthesized singing voice spectrogram, and latent optimal path from the prior encoder. The GT and generated spectrogram are almost identical, and the generated spectrogram has a similar temporal structure to the inferred latent optimal path.
  • Figure 5: Visualization of GT alignment between phoneme tokens and spectrogram frames, latent optimal paths from the encoder, optimal paths from random latent space for two audio clips. BDP-VAE achieves closer alignments with GT, indicating its effectiveness in finding latent optimal paths.
  • ...and 11 more figures

Theorems & Definitions (14)

  • Definition 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Lemma 4.4
  • Lemma 4.5
  • Corollary 4.6
  • Corollary 4.7
  • Definition 4.8
  • Lemma 4.9
  • Lemma 4.10
  • ...and 4 more