Table of Contents
Fetching ...

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

George Tucker, Andriy Mnih, Chris J. Maddison, Dieterich Lawson, Jascha Sohl-Dickstein

TL;DR

The paper tackles the problem of high-variance gradient estimates for models with discrete latent variables by introducing REBAR, a novel unbiased gradient estimator that combines REINFORCE with a reparameterization-based control variate derived from a continuous relaxation. By conditioning the control variate and optimally tuning the relaxation temperature online, REBAR achieves state-of-the-art variance reduction and faster convergence on generative and structured prediction tasks. The approach unifies and extends prior methods (MuProp, NVIL, Concrete) with a principled, unbiased framework and demonstrates strong empirical gains on MNIST, Omniglot, and related tasks. These results suggest practical benefits for training deep models with discrete stochastic components and open avenues for RL and multi-layer extensions.

Abstract

Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al. 2016, Maddison et al. 2016) has taken a different approach, introducing a continuous relaxation of discrete variables to produce low-variance, but biased, gradient estimates. In this work, we combine the two approaches through a novel control variate that produces low-variance, \emph{unbiased} gradient estimates. Then, we introduce a modification to the continuous relaxation and show that the tightness of the relaxation can be adapted online, removing it as a hyperparameter. We show state-of-the-art variance reduction on several benchmark generative modeling tasks, generally leading to faster convergence to a better final log-likelihood.

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

TL;DR

The paper tackles the problem of high-variance gradient estimates for models with discrete latent variables by introducing REBAR, a novel unbiased gradient estimator that combines REINFORCE with a reparameterization-based control variate derived from a continuous relaxation. By conditioning the control variate and optimally tuning the relaxation temperature online, REBAR achieves state-of-the-art variance reduction and faster convergence on generative and structured prediction tasks. The approach unifies and extends prior methods (MuProp, NVIL, Concrete) with a principled, unbiased framework and demonstrates strong empirical gains on MNIST, Omniglot, and related tasks. These results suggest practical benefits for training deep models with discrete stochastic components and open avenues for RL and multi-layer extensions.

Abstract

Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al. 2016, Maddison et al. 2016) has taken a different approach, introducing a continuous relaxation of discrete variables to produce low-variance, but biased, gradient estimates. In this work, we combine the two approaches through a novel control variate that produces low-variance, \emph{unbiased} gradient estimates. Then, we introduce a modification to the continuous relaxation and show that the tightness of the relaxation can be adapted online, removing it as a hyperparameter. We show state-of-the-art variance reduction on several benchmark generative modeling tasks, generally leading to faster convergence to a better final log-likelihood.

Paper Structure

This paper contains 26 sections, 39 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Log variance of the gradient estimator (left) and loss (right) for the toy problem with $t = 0.45$. Only the unbiased estimators converge to the correct answer. We indicate the temperature in parenthesis where relevant.
  • Figure 2: Log variance of the gradient estimator for the two layer linear model (left) and single layer nonlinear model (right) on the MNIST generative modeling task. All of the estimators are unbiased, so their variance is directly comparable. We estimated moments from exponential moving averages (with decay=0.999; we found that the results were robust to the exact value). The temperature is shown in parenthesis where relevant.
  • Figure 3: Training variational lower bound for the two layer linear model (left) and single layer nonlinear model (right) on the MNIST generative modeling task. We plot 5 trials over different random initializations for each method with the median trial highlighted. The temperature is shown in parenthesis where relevant.
  • Figure 4: Log variance of the gradient estimator for the two layer linear model (left) and single layer nonlinear model (right) on the structured prediction task.
  • Figure 5: Training variational lower bound for the two layer linear model (left) and single layer nonlinear model (right) on the structured prediction task. We plot 5 trials over different random initializations for each method with the median trial highlighted.
  • ...and 6 more figures