The ArtBench Dataset: Benchmarking Generative Models with Artworks
Peiyuan Liao, Xiuyu Li, Xihui Liu, Kurt Keutzer
TL;DR
ArtBench-10 addresses a gap in artwork generation benchmarks by providing a standardized, class-balanced dataset of 60,000 artworks across 10 styles, with 3 target resolutions and a uniform preprocessing pipeline. It details an end-to-end data collection, annotation, and sampling pipeline to produce balanced, high-quality training and testing splits, and benchmarks a range of generative models (GANs, diffusion, VAEs) using IS, FID, KID, and improved precision/recall metrics. The results show strong performance for StyleGAN2-ADA across settings, while revealing trade-offs between quality and diversity across models and styles, and confirming non-memorization via nearest-neighbor analysis. The dataset aims to standardize artwork synthesis evaluation, while acknowledging biases toward European, North American, and East Asian art and outlining plans for broader coverage and responsible use in future work.
Abstract
We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation. It comprises 60,000 images of artwork from 10 distinctive artistic styles, with 5,000 training images and 1,000 testing images per style. ArtBench-10 has several advantages over previous artwork datasets. Firstly, it is class-balanced while most previous artwork datasets suffer from the long tail class distributions. Secondly, the images are of high quality with clean annotations. Thirdly, ArtBench-10 is created with standardized data collection, annotation, filtering, and preprocessing procedures. We provide three versions of the dataset with different resolutions ($32\times32$, $256\times256$, and original image size), formatted in a way that is easy to be incorporated by popular machine learning frameworks. We also conduct extensive benchmarking experiments using representative image synthesis models with ArtBench-10 and present in-depth analysis. The dataset is available at https://github.com/liaopeiyuan/artbench under a Fair Use license.
