Table of Contents
Fetching ...

Gaussian-Bernoulli RBMs Without Tears

Renjie Liao, Simon Kornblith, Mengye Ren, David J. Fleet, Geoffrey Hinton

TL;DR

<3-5 sentence high-level summary> This work reexamines Gaussian-Bernoulli RBMs by introducing a Gibbs-Langevin sampling scheme and a Modified Contrastive Divergence that allows generating samples from noise, enabling fair comparisons with deep generative models. The authors show that gradient clipping combined with these sampling strategies yields robust training at large learning rates, reducing reliance on heuristic tricks. They demonstrate that GRBMs can unconditionally generate high-quality samples on Gaussian mixtures and real image datasets (MNIST, FashionMNIST, CelebA) using a single hidden layer, and release code to support replication. These contributions advance practical GRBM learning and invite exploration of deeper Gaussian energy-based models and convolutional variants.

Abstract

We revisit the challenging problem of training Gaussian-Bernoulli restricted Boltzmann machines (GRBMs), introducing two innovations. We propose a novel Gibbs-Langevin sampling algorithm that outperforms existing methods like Gibbs sampling. We propose a modified contrastive divergence (CD) algorithm so that one can generate images with GRBMs starting from noise. This enables direct comparison of GRBMs with deep generative models, improving evaluation protocols in the RBM literature. Moreover, we show that modified CD and gradient clipping are enough to robustly train GRBMs with large learning rates, thus removing the necessity of various tricks in the literature. Experiments on Gaussian Mixtures, MNIST, FashionMNIST, and CelebA show GRBMs can generate good samples, despite their single-hidden-layer architecture. Our code is released at: \url{https://github.com/lrjconan/GRBM}.

Gaussian-Bernoulli RBMs Without Tears

TL;DR

<3-5 sentence high-level summary> This work reexamines Gaussian-Bernoulli RBMs by introducing a Gibbs-Langevin sampling scheme and a Modified Contrastive Divergence that allows generating samples from noise, enabling fair comparisons with deep generative models. The authors show that gradient clipping combined with these sampling strategies yields robust training at large learning rates, reducing reliance on heuristic tricks. They demonstrate that GRBMs can unconditionally generate high-quality samples on Gaussian mixtures and real image datasets (MNIST, FashionMNIST, CelebA) using a single hidden layer, and release code to support replication. These contributions advance practical GRBM learning and invite exploration of deeper Gaussian energy-based models and convolutional variants.

Abstract

We revisit the challenging problem of training Gaussian-Bernoulli restricted Boltzmann machines (GRBMs), introducing two innovations. We propose a novel Gibbs-Langevin sampling algorithm that outperforms existing methods like Gibbs sampling. We propose a modified contrastive divergence (CD) algorithm so that one can generate images with GRBMs starting from noise. This enables direct comparison of GRBMs with deep generative models, improving evaluation protocols in the RBM literature. Moreover, we show that modified CD and gradient clipping are enough to robustly train GRBMs with large learning rates, thus removing the necessity of various tricks in the literature. Experiments on Gaussian Mixtures, MNIST, FashionMNIST, and CelebA show GRBMs can generate good samples, despite their single-hidden-layer architecture. Our code is released at: \url{https://github.com/lrjconan/GRBM}.
Paper Structure (28 sections, 28 equations, 9 figures, 3 tables, 5 algorithms)

This paper contains 28 sections, 28 equations, 9 figures, 3 tables, 5 algorithms.

Figures (9)

  • Figure 1: Density modelling using GRBMs on data from a Gaussian mixtures with isotropic (rows 1 and 2) and anisotropic variances (rows 3 and 4). Rows 1 and 3 show normalized GMM densities and (unnormalized) negative energy values for GRBMs. Rows 2 and 4 show samples drawn under different models and methods; i.e., (a) Ground Truth; (b) Gibbs; (c) Langevin wo. Adjust; (d) Langevin w. Adjust; (e) Gibbs-Langevin wo. Adjust; (f) Gibbs-Langevin w. Adjust.
  • Figure 2: Intermediate samples from Gibbs-Langevin sampling.
  • Figure 3: (a) Learning curve of (natural) log variances, (b) learned filters, and (c) samples on MNIST.
  • Figure 4: Samples from GRBMs on (a) FashionMNIST, (b) CelebA-32, and (c) CelebA-2K-64.
  • Figure 5: Samples from GRBMs learned with different sampling algorithms on MNIST.
  • ...and 4 more figures