Table of Contents
Fetching ...

Demystifying Neural Style Transfer

Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou

TL;DR

This work reframes neural style transfer as a distribution-alignment problem by showing that Gram-matrix matching is equivalent to minimizing an MMD statistic with a second-order polynomial kernel. It situates style as a distribution over CNN feature activations and generalizes the approach by exploring MMD with different kernels and BN-statistics matching. The results demonstrate that multiple distribution-alignment methods yield competitive stylizations, highlighting the flexibility and potential of a domain-adaptation viewpoint for style transfer. The insights provide a foundation for designing new, kernel-aware style transfer methods with varied visual characteristics.

Abstract

Neural Style Transfer has recently demonstrated very exciting results which catches eyes in both academia and industry. Despite the amazing results, the principle of neural style transfer, especially why the Gram matrices could represent style remains unclear. In this paper, we propose a novel interpretation of neural style transfer by treating it as a domain adaptation problem. Specifically, we theoretically show that matching the Gram matrices of feature maps is equivalent to minimize the Maximum Mean Discrepancy (MMD) with the second order polynomial kernel. Thus, we argue that the essence of neural style transfer is to match the feature distributions between the style images and the generated images. To further support our standpoint, we experiment with several other distribution alignment methods, and achieve appealing results. We believe this novel interpretation connects these two important research fields, and could enlighten future researches.

Demystifying Neural Style Transfer

TL;DR

This work reframes neural style transfer as a distribution-alignment problem by showing that Gram-matrix matching is equivalent to minimizing an MMD statistic with a second-order polynomial kernel. It situates style as a distribution over CNN feature activations and generalizes the approach by exploring MMD with different kernels and BN-statistics matching. The results demonstrate that multiple distribution-alignment methods yield competitive stylizations, highlighting the flexibility and potential of a domain-adaptation viewpoint for style transfer. The insights provide a foundation for designing new, kernel-aware style transfer methods with varied visual characteristics.

Abstract

Neural Style Transfer has recently demonstrated very exciting results which catches eyes in both academia and industry. Despite the amazing results, the principle of neural style transfer, especially why the Gram matrices could represent style remains unclear. In this paper, we propose a novel interpretation of neural style transfer by treating it as a domain adaptation problem. Specifically, we theoretically show that matching the Gram matrices of feature maps is equivalent to minimize the Maximum Mean Discrepancy (MMD) with the second order polynomial kernel. Thus, we argue that the essence of neural style transfer is to match the feature distributions between the style images and the generated images. To further support our standpoint, we experiment with several other distribution alignment methods, and achieve appealing results. We believe this novel interpretation connects these two important research fields, and could enlighten future researches.

Paper Structure

This paper contains 18 sections, 12 equations, 4 figures.

Figures (4)

  • Figure 1: Style reconstructions of different methods in five layers, respectively. Each row corresponds to one method and the reconstruction results are obtained by only using the style loss $\mathcal{L}_{style}$ with $\alpha=0$. We also reconstruct different style representations in different subsets of layers of VGG network. For example, layer 3 contains the style loss of the first 3 layers ($w_1=w_2=w_3=1.0$ and $w_4=w_5=0.0$).
  • Figure 2: Results of the four methods (linear, poly, Gaussian and BN) with different balance factor $\gamma$. Larger $\gamma$ means more emphasis on the style loss.
  • Figure 3: Visual results of several style transfer methods, including linear, poly, Gaussian and BN. The balance factors $\gamma$ in the six examples are $2.0$, $2.0$, $2.0$, $5.0$, $5.0$ and $5.0$, respectively.
  • Figure 4: Results of two fusion methods: BN + poly and linear + Gaussian. The top two rows are the results of first fusion method and the bottom two rows correspond to the second one. Each column shows the results of a balance weight between the two methods. $\gamma$ is set as 5.0.