Style Transfer Dataset: What Makes A Good Stylization?
Victor Kitov, Valentin Abramov, Mikhail Akhtyrchenko
TL;DR
The paper tackles the challenge of reproducible evaluation for image style transfer by introducing a large, permissively licensed dataset with 50 content images, 50 style images, and four style-size scales, yielding 10,000 stylizations rated by three annotators (totaling 30,000 ratings). Stylizations are generated with ArtFlow, recolored in LAB to separate color transfer from texture, and evaluated to identify quantitative and qualitative factors driving perceived quality. Key findings reveal that color and brightness diversity, edge preservation, and, most strongly, style size critically influence ratings, with notable sensitivity to face reproduction and content characteristics. The work provides practical recommendations and an evaluation tool to enable automated, comparable assessments across studies, advancing the development and tuning of style transfer methods for real-world use.
Abstract
We present a new dataset with the goal of advancing image style transfer - the task of rendering one image in the style of another image. The dataset covers various content and style images of different size and contains 10.000 stylizations manually rated by three annotators in 1-10 scale. Based on obtained ratings, we find which factors are mostly responsible for favourable and poor user evaluations and show quantitative measures having statistically significant impact on user grades. A methodology for creating style transfer datasets is discussed. Presented dataset can be used in automating multiple tasks, related to style transfer configuration and evaluation.
