Evaluation in Neural Style Transfer: A Review
Eleftherios Ioannou, Steve Maddock
TL;DR
This paper tackles the challenge of evaluating Neural Style Transfer (NST) by surveying the diverse qualitative, human, and quantitative evaluation methods across image and video NST, and by highlighting the lack of standardized benchmarks. It analyzes datasets, evaluation designs, and metrics, exposing inconsistencies that hinder fair comparisons and reproducibility. The authors propose concrete recommendations, including benchmark datasets for content/style and evaluation-specific data, standardized human studies, and a concise set of quantitative metrics, aimed at enabling reliable, interpretable, and repeatable NST assessments. The work emphasizes open data practices and statistical rigor as essential steps toward a unified evaluation framework, which could significantly accelerate fair progress in NST research and applications.
Abstract
The field of Neural Style Transfer (NST) has witnessed remarkable progress in the past few years, with approaches being able to synthesize artistic and photorealistic images and videos of exceptional quality. To evaluate such results, a diverse landscape of evaluation methods and metrics is used, including authors' opinions based on side-by-side comparisons, human evaluation studies that quantify the subjective judgements of participants, and a multitude of quantitative computational metrics which objectively assess the different aspects of an algorithm's performance. However, there is no consensus regarding the most suitable and effective evaluation procedure that can guarantee the reliability of the results. In this review, we provide an in-depth analysis of existing evaluation techniques, identify the inconsistencies and limitations of current evaluation methods, and give recommendations for standardized evaluation practices. We believe that the development of a robust evaluation framework will not only enable more meaningful and fairer comparisons among NST methods but will also enhance the comprehension and interpretation of research findings in the field.
