Seeing The Words: Evaluating AI-generated Biblical Art
Hidde Makimei, Shuai Wang, Willem van Peursen
TL;DR
The paper addresses the challenge of evaluating AI-generated biblical imagery by constructing the Visio Divina Dataset (VDD) with 7,116 images produced from nine text-to-image tools using five biblical prompts. It pairs automated analyses of human figures and sentiment with manual religious-aesthetic assessments and compares AI outputs to Renaissance/Baroque paintings to gauge faithfulness and atmosphere. Key findings show Midjourney most closely resembles human art across several metrics, DALL·E often fails to capture contextual details, and Stable Diffusion yields varied yet sometimes novel results, including nontraditional aesthetics. The work provides open-source data, an end-to-end evaluation workflow, and practical insights for using AI-generated biblical art in education and religion, while outlining future improvements in prompts, detectors, and cross-domain evaluations.
Abstract
The past years witnessed a significant amount of Artificial Intelligence (AI) tools that can generate images from texts. This triggers the discussion of whether AI can generate accurate images using text from the Bible with respect to the corresponding biblical contexts and backgrounds. Despite some existing attempts at a small scale, little work has been done to systematically evaluate these generated images. In this work, we provide a large dataset of over 7K images using biblical text as prompts. These images were evaluated with multiple neural network-based tools on various aspects. We provide an assessment of accuracy and some analysis from the perspective of religion and aesthetics. Finally, we discuss the use of the generated images and reflect on the performance of the AI generators.
