Learning to Evaluate the Artness of AI-generated Images
Junyu Chen, Jie An, Hanjia Lyu, Christopher Kanan, Jiebo Luo
TL;DR
This work introduces ArtScore, a reference-free, instance-level metric for evaluating the artness of AI-generated images. It constructs a pseudo-annotated dataset by transferring photorealistic StyleGAN2 models to artistic styles and generating interpolations between them, with artness controlled by the interpolation weight $\alpha$, then trains a neural network with a learn-to-rank objective (ListMLE) to predict relative artness. Empirical results show ArtScore aligns more closely with human artistic judgments than Gram Loss or ArtFID and improves artness ranking when combined with other metrics, validating its usefulness for evaluating and guiding art-focused image generation. The framework offers a scalable objective tool for researchers and artists to quantify and compare AI-generated art qualities, with potential integration into model training and sampling pipelines.
Abstract
Assessing the artness of AI-generated images continues to be a challenge within the realm of image generation. Most existing metrics cannot be used to perform instance-level and reference-free artness evaluation. This paper presents ArtScore, a metric designed to evaluate the degree to which an image resembles authentic artworks by artists (or conversely photographs), thereby offering a novel approach to artness assessment. We first blend pre-trained models for photo and artwork generation, resulting in a series of mixed models. Subsequently, we utilize these mixed models to generate images exhibiting varying degrees of artness with pseudo-annotations. Each photorealistic image has a corresponding artistic counterpart and a series of interpolated images that range from realistic to artistic. This dataset is then employed to train a neural network that learns to estimate quantized artness levels of arbitrary images. Extensive experiments reveal that the artness levels predicted by ArtScore align more closely with human artistic evaluation than existing evaluation metrics, such as Gram loss and ArtFID.
