Table of Contents
Fetching ...

Texture Synthesis Using Convolutional Neural Networks

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge

Abstract

Here we introduce a new model of natural textures based on the feature spaces of convolutional neural networks optimised for object recognition. Samples from the model are of high perceptual quality demonstrating the generative power of neural networks trained in a purely discriminative fashion. Within the model, textures are represented by the correlations between feature maps in several layers of the network. We show that across layers the texture representations increasingly capture the statistical properties of natural images while making object information more and more explicit. The model provides a new tool to generate stimuli for neuroscience and might offer insights into the deep representations learned by convolutional neural networks.

Texture Synthesis Using Convolutional Neural Networks

Abstract

Here we introduce a new model of natural textures based on the feature spaces of convolutional neural networks optimised for object recognition. Samples from the model are of high perceptual quality demonstrating the generative power of neural networks trained in a purely discriminative fashion. Within the model, textures are represented by the correlations between feature maps in several layers of the network. We show that across layers the texture representations increasingly capture the statistical properties of natural images while making object information more and more explicit. The model provides a new tool to generate stimuli for neuroscience and might offer insights into the deep representations learned by convolutional neural networks.

Paper Structure

This paper contains 6 sections, 4 equations, 4 figures.

Figures (4)

  • Figure 1: Synthesis method. Texture analysis (left). The original texture is passed through the CNN and the Gram matrices $G_l$ on the feature responses of a number of layers are computed. Texture synthesis (right). A white noise image $\hat{\vec{x}}$ is passed through the CNN and a loss function $E_l$ is computed on every layer included in the texture model. The total loss function $\mathcal{L}$ is a weighted sum of the contributions $E_l$ from each layer. Using gradient descent on the total loss with respect to the pixel values, a new image is found that produces the same Gram matrices $\hat{G}_l$ as the original texture.
  • Figure 2: Generated stimuli. Each row corresponds to a different processing stage in the network. When only constraining the texture representation on the lowest layer, the synthesised textures have little structure, similarly to spectrally matched noise (first row). With increasing number of layers on which we match the texture representation we find that we generate images with increasing degree of naturalness (rows 2--5; labels on the left indicate the top-most layer included). The source textures in the first three columns were previously used by Portilla and Simoncelli portilla_parametric_2000. For better comparison we also show their results (last row). The last column shows textures generated from a non-texture image to give a better intuition about how the texture model represents image information.
  • Figure 3: A, Number of parameters in the texture model. We explore several ways to reduce the number of parameters in the texture model (see main text) and compare the results. B, Textures generated from the different layers of the caffe reference network jia_caffe:_2014krizhevsky_imagenet_2012. The textures are of lesser quality than those generated with the VGG network. C, Textures generated with the VGG architecture but random weights. Texture synthesis fails in this case, indicating that learned filters are crucial for texture generation.
  • Figure 4: Performance of a linear classifier on top of the texture representations in different layers in classifying objects from the ImageNet dataset. High-level information is made increasingly explicit along the hierarchy of our texture model.