Table of Contents
Fetching ...

GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution

Matt J. Kusner, José Miguel Hernández-Lobato

TL;DR

The paper tackles the difficulty of applying GANs to sequences of discrete elements by adopting the Gumbel-softmax as a differentiable approximation for sampling from a multinomial. It deploys an LSTM-based generator and discriminator and trains them with an adversarial procedure that leverages differentiable sampling to backpropagate through discrete outputs. Through experiments on a context-free grammar task, the authors show that Gumbel-softmax-enabled GANs can produce realistic discrete sequences and demonstrate the effect of temperature annealing on training. They suggest future directions such as variational divergence minimization and density ratio estimation to further improve performance on discrete data.

Abstract

Generative Adversarial Networks (GAN) have limitations when the goal is to generate sequences of discrete elements. The reason for this is that samples from a distribution on discrete objects such as the multinomial are not differentiable with respect to the distribution parameters. This problem can be avoided by using the Gumbel-softmax distribution, which is a continuous approximation to a multinomial distribution parameterized in terms of the softmax function. In this work, we evaluate the performance of GANs based on recurrent neural networks with Gumbel-softmax output distributions in the task of generating sequences of discrete elements.

GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution

TL;DR

The paper tackles the difficulty of applying GANs to sequences of discrete elements by adopting the Gumbel-softmax as a differentiable approximation for sampling from a multinomial. It deploys an LSTM-based generator and discriminator and trains them with an adversarial procedure that leverages differentiable sampling to backpropagate through discrete outputs. Through experiments on a context-free grammar task, the authors show that Gumbel-softmax-enabled GANs can produce realistic discrete sequences and demonstrate the effect of temperature annealing on training. They suggest future directions such as variational divergence minimization and density ratio estimation to further improve performance on discrete data.

Abstract

Generative Adversarial Networks (GAN) have limitations when the goal is to generate sequences of discrete elements. The reason for this is that samples from a distribution on discrete objects such as the multinomial are not differentiable with respect to the distribution parameters. This problem can be avoided by using the Gumbel-softmax distribution, which is a continuous approximation to a multinomial distribution parameterized in terms of the softmax function. In this work, we evaluate the performance of GANs based on recurrent neural networks with Gumbel-softmax output distributions in the task of generating sequences of discrete elements.

Paper Structure

This paper contains 4 sections, 5 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Models to generate simple one-variable arithmetic sequences. (Top): The classic LSTM model during the prediction phase. Each LSTM unit (shown as a blue box) makes a prediction based on the input it as seen in the past. This prediction is then used as input to the next unit, which makes its own prediction, and so on. (Bottom): Our generative model for discrete sequences. At the beginning we draw a pair of samples which are fed into the network in place of the initial cell state $C_0$ and hidden state $h_0$. Our trained network takes these samples and uses them to generate an initial character, this generated character is fed to the next cell in the LSTM as input, and so on.
  • Figure 2: The adversarial training procedure. Our generative model first generates a full-length sequence. This sequence is fed to the discriminator (also a LSTM), which predicts the probability of it being a real sequence. Additionally (not shown), the discriminator is fed real discrete sequence data, which again it predicts the probability of it being real. The weights the networks are modified to make the discriminator better at recognizing real from fake data, and to make the generator better at fooling the discriminator.
  • Figure 3: The generative and discriminative losses throughout training. Ideally the loss of the discriminator should increase while the generator should decrease as the generator becomes better at mimicking the real data. (a) The default network with Gumbel-softmax temperature annealing. (b) The same setting as (a) but increasing the size of the generated samples to $1,000$. (c) Only varying the input vector temperature. (d) Only introducing random noise into the hidden state and not the cell state.
  • Figure 4: The generated text for MLE and GAN models. The plots (a)-(d) correspond to the models of Figure \ref{['figure.losses']}.