Generative Image Modeling Using Spatial LSTMs
Lucas Theis, Matthias Bethge
TL;DR
The paper tackles unsupervised, scalable generative modeling of natural images by introducing RIDE, a recurrent image density estimator that stacks spatial LSTMs to condition a factorized MCGSM. This framework enables tractable likelihood calculations while capturing long-range, nonlinear dependencies across large images. Empirical results across BSDS300, dead leaves, and texture datasets show state-of-the-art or competitive log-likelihood performance, with qualitative demonstrations in texture synthesis and inpainting. The approach provides a flexible building block for future deep generative models and highlights the value of combining spatial recurrence with mixture-based conditional densities for image modeling.
Abstract
Modeling the distribution of natural images is challenging, partly because of strong statistical dependencies which can extend over hundreds of pixels. Recurrent neural networks have been successful in capturing long-range dependencies in a number of problems but only recently have found their way into generative image models. We here introduce a recurrent image model based on multi-dimensional long short-term memory units which are particularly suited for image modeling due to their spatial structure. Our model scales to images of arbitrary size and its likelihood is computationally tractable. We find that it outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting.
