Table of Contents
Fetching ...

Generative Image Modeling Using Spatial LSTMs

Lucas Theis, Matthias Bethge

TL;DR

The paper tackles unsupervised, scalable generative modeling of natural images by introducing RIDE, a recurrent image density estimator that stacks spatial LSTMs to condition a factorized MCGSM. This framework enables tractable likelihood calculations while capturing long-range, nonlinear dependencies across large images. Empirical results across BSDS300, dead leaves, and texture datasets show state-of-the-art or competitive log-likelihood performance, with qualitative demonstrations in texture synthesis and inpainting. The approach provides a flexible building block for future deep generative models and highlights the value of combining spatial recurrence with mixture-based conditional densities for image modeling.

Abstract

Modeling the distribution of natural images is challenging, partly because of strong statistical dependencies which can extend over hundreds of pixels. Recurrent neural networks have been successful in capturing long-range dependencies in a number of problems but only recently have found their way into generative image models. We here introduce a recurrent image model based on multi-dimensional long short-term memory units which are particularly suited for image modeling due to their spatial structure. Our model scales to images of arbitrary size and its likelihood is computationally tractable. We find that it outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting.

Generative Image Modeling Using Spatial LSTMs

TL;DR

The paper tackles unsupervised, scalable generative modeling of natural images by introducing RIDE, a recurrent image density estimator that stacks spatial LSTMs to condition a factorized MCGSM. This framework enables tractable likelihood calculations while capturing long-range, nonlinear dependencies across large images. Empirical results across BSDS300, dead leaves, and texture datasets show state-of-the-art or competitive log-likelihood performance, with qualitative demonstrations in texture synthesis and inpainting. The approach provides a flexible building block for future deep generative models and highlights the value of combining spatial recurrence with mixture-based conditional densities for image modeling.

Abstract

Modeling the distribution of natural images is challenging, partly because of strong statistical dependencies which can extend over hundreds of pixels. Recurrent neural networks have been successful in capturing long-range dependencies in a number of problems but only recently have found their way into generative image models. We here introduce a recurrent image model based on multi-dimensional long short-term memory units which are particularly suited for image modeling due to their spatial structure. Our model scales to images of arbitrary size and its likelihood is computationally tractable. We find that it outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting.

Paper Structure

This paper contains 12 sections, 9 equations, 7 figures.

Figures (7)

  • Figure 1: (A) We factorize the distribution of images such that the prediction of a pixel (black) may depend on any pixel in the upper-left green region. (B) A graphical model representation of an MCGSM with a causal neighborhood limited to a small region. (C) A visualization of our recurrent image model with two layers of spatial LSTMs. The pixels of the image are represented twice and some arrows are omitted for clarity. Through feedforward connections, the prediction of a pixel depends directly on its neighborhood (green), but through recurrent connections it has access to the information in a much larger region (red).
  • Figure 2: Average log-likelihoods and log-likelihood rates for image patches (without/with DC comp.) and large images extracted from BSDS300 Martin:2001.
  • Figure 3: Average log-likelihood rates for image patches and large images extracted from van Hateren's dataset vanHateren:1998.
  • Figure 4: Average log-likelihood rates on dead leaf images. A deep recurrent image model is on a par with a deep diffusion model Sohl-Dickstein:2015. Using ensembles we are able to further improve the likelihood.
  • Figure 5: Model performance on dead leaves as a function of the causal neighborhood width. Simply increasing the neighborhood size of the MCGSM is not sufficient to improve performance.
  • ...and 2 more figures