Table of Contents
Fetching ...

Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks

Alex J. Champandard

TL;DR

The paper tackles unpredictability in CNN-based style transfer by introducing semantic annotations that guide generation. It presents an augmented CNN architecture that injects semantic maps into feature representations, enabling content-aware style transfer and doodle-to-paintings. The approach preserves compatibility with patch-based methods and demonstrates improved control, fewer artifacts, and broader applicability, bridging image segmentation advances with image synthesis. This work offers practical tools for producing coherent, semantically consistent stylizations in portraits, landscapes, and beyond.

Abstract

Convolutional neural networks (CNNs) have proven highly effective at image synthesis and style transfer. For most users, however, using them as tools can be a challenging task due to their unpredictable behavior that goes against common intuitions. This paper introduces a novel concept to augment such generative architectures with semantic annotations, either by manually authoring pixel labels or using existing solutions for semantic segmentation. The result is a content-aware generative algorithm that offers meaningful control over the outcome. Thus, we increase the quality of images generated by avoiding common glitches, make the results look significantly more plausible, and extend the functional range of these algorithms---whether for portraits or landscapes, etc. Applications include semantic style transfer and turning doodles with few colors into masterful paintings!

Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks

TL;DR

The paper tackles unpredictability in CNN-based style transfer by introducing semantic annotations that guide generation. It presents an augmented CNN architecture that injects semantic maps into feature representations, enabling content-aware style transfer and doodle-to-paintings. The approach preserves compatibility with patch-based methods and demonstrates improved control, fewer artifacts, and broader applicability, bridging image segmentation advances with image synthesis. This work offers practical tools for producing coherent, semantically consistent stylizations in portraits, landscapes, and beyond.

Abstract

Convolutional neural networks (CNNs) have proven highly effective at image synthesis and style transfer. For most users, however, using them as tools can be a challenging task due to their unpredictable behavior that goes against common intuitions. This paper introduces a novel concept to augment such generative architectures with semantic annotations, either by manually authoring pixel labels or using existing solutions for semantic segmentation. The result is a content-aware generative algorithm that offers meaningful control over the outcome. Thus, we increase the quality of images generated by avoiding common glitches, make the results look significantly more plausible, and extend the functional range of these algorithms---whether for portraits or landscapes, etc. Applications include semantic style transfer and turning doodles with few colors into masterful paintings!

Paper Structure

This paper contains 18 sections, 4 equations, 6 figures.

Figures (6)

  • Figure 1: Synthesizing paintings with deep neural networks via analogy. (a) Original painting by Renoir, (b) semantic annotations, (c) desired layout, (d) generated output.
  • Figure 2: Comparison and breakdown of synthesized portraits, chosen because of extreme color and feature mismatches. Parameters were adjusted to make the style transfer most faithful while reducing artifacts such as patch repetition or odd blends---which proved challenging for the second column, but more straightforward in the last column thanks to semantic annotations. The top row shows transfer of painted style onto a photo (easier), and the bottom turning the painting into a photo (harder); see area around the nose and mouth for failures. [Original painting by Mia Bergeron.]
  • Figure 3: Our augmented CNN that uses regular filters of N channels (top), concatenated with a semantic map of M=1 channel (bottom) either output from another network capable of labeling pixels or as manual annotations.
  • Figure 4: Examples of semantic style transfer with Van Gogh painting. Annotations for nose and mouth are not required as the images are similar, however carefully annotating the eyeballs helps when generating photo-quality portraits. [Photo by Seth Johnson, concept by Kyle McDonald.]
  • Figure 5: Varying parameters for the style transfer. First column shows changes in style weight $\beta$: 0) content reconstruction, 10 to 50) artifact-free blends thanks to semantic constraint, 250) best style quality. Second column shows values of semantic weight $\gamma$: 0) style overpowers content without semantic constraint, 10) low semantic weight strengthens influence of style, 50) default value that equalizes channels, 250) high semantic weight lowers quality of style.
  • ...and 1 more figures