Table of Contents
Fetching ...

Fast Patch-based Style Transfer of Arbitrary Style

Tian Qi Chen, Mark Schmidt

TL;DR

The paper introduces a patch-based style swap computed within a single CNN layer to fuse content structure with style textures, paired with an inverse network that enables fast, arbitrary-content/style stylization. This yields a feedforward pipeline capable of handling any content and style images without retraining for each style. The method demonstrates consistent, video-friendly stylization and provides rigorous comparisons to optimization-based and fixed-style methods, while highlighting trade-offs in quality and computation. Overall, it offers a practical balance between versatility and speed, with room for enhancements in global style assessment and temporal coherence.

Abstract

Artistic style transfer is an image synthesis problem where the content of an image is reproduced with the style of another. Recent works show that a visually appealing style transfer can be achieved by using the hidden activations of a pretrained convolutional neural network. However, existing methods either apply (i) an optimization procedure that works for any style image but is very expensive, or (ii) an efficient feedforward network that only allows a limited number of trained styles. In this work we propose a simpler optimization objective based on local matching that combines the content structure and style textures in a single layer of the pretrained network. We show that our objective has desirable properties such as a simpler optimization landscape, intuitive parameter tuning, and consistent frame-by-frame performance on video. Furthermore, we use 80,000 natural images and 80,000 paintings to train an inverse network that approximates the result of the optimization. This results in a procedure for artistic style transfer that is efficient but also allows arbitrary content and style images.

Fast Patch-based Style Transfer of Arbitrary Style

TL;DR

The paper introduces a patch-based style swap computed within a single CNN layer to fuse content structure with style textures, paired with an inverse network that enables fast, arbitrary-content/style stylization. This yields a feedforward pipeline capable of handling any content and style images without retraining for each style. The method demonstrates consistent, video-friendly stylization and provides rigorous comparisons to optimization-based and fixed-style methods, while highlighting trade-offs in quality and computation. Overall, it offers a practical balance between versatility and speed, with room for enhancements in global style assessment and temporal coherence.

Abstract

Artistic style transfer is an image synthesis problem where the content of an image is reproduced with the style of another. Recent works show that a visually appealing style transfer can be achieved by using the hidden activations of a pretrained convolutional neural network. However, existing methods either apply (i) an optimization procedure that works for any style image but is very expensive, or (ii) an efficient feedforward network that only allows a limited number of trained styles. In this work we propose a simpler optimization objective based on local matching that combines the content structure and style textures in a single layer of the pretrained network. We show that our objective has desirable properties such as a simpler optimization landscape, intuitive parameter tuning, and consistent frame-by-frame performance on video. Furthermore, we use 80,000 natural images and 80,000 paintings to train an inverse network that approximates the result of the optimization. This results in a procedure for artistic style transfer that is efficient but also allows arbitrary content and style images.

Paper Structure

This paper contains 14 sections, 5 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: An example of our artistic style transfer method and its feedforward approximation. The approximation network has never seen the content or style image during training.
  • Figure 2: Illustration of a style swap operation. The 2D convolution extracts patches of size $3\times 3$ and stride $1$, and computes the normalized cross-correlations. There are $n_c=9$ spatial locations and $n_s=4$ feature channels immediately before and after the channel-wise argmax operation. The 2D transposed convolution reconstructs the complete activations by placing each best matching style patch at the corresponding spatial location.
  • Figure 3: The effect of style swapping in different layers of VGG-19 simonyan2014very, and also in RGB space. Due to the naming convention of VGG-19, "relu$X$_1" refers to the first ReLU layer after the $(X-1)$-th maxpooling layer. The style swap operation uses patches of size $3\times 3$ and stride $1$, and then the RGB image is constructed using optimization.
  • Figure 4: We propose the first feedforward method for style transfer that can be used for arbitrary style images. We formulate style transfer using a constructive procedure (Style Swap) and train an inverse network to generate the image.
  • Figure 5: Our method achieves consistent results compared to existing optimization formulations. We see that Gatys et al.'s formulation GatysEB15a has multiple local optima while we are able to consistently achieve the same style transfer effect with random initializations. Figure \ref{['fig:quantconsistency']} shows this quantitatively.
  • ...and 5 more figures