A Framework for Critical Evaluation of Text-to-Image Models: Integrating Art Historical Analysis, Artistic Exploration, and Critical Prompt Engineering
Amalia Foka
TL;DR
Powerful text-to-image models raise concerns that technical metrics like $FID$ and $CLIP$-based evaluations miss artistic, symbolic, and socio-cultural dimensions. The paper proposes an interdisciplinary evaluation framework that unites art historical analysis, artistic exploration, and critical prompt engineering to diagnose biases and cultural representations in AI-generated imagery. Through case studies and methodological steps, it demonstrates how this framework reveals gender, race, and cultural biases that conventional metrics overlook, and it outlines procedures for benchmarking and auditing that promote transparency and accountability. By fostering collaboration across computer science, art history, critical theory, and artistry, the approach aims to guide the responsible development of culturally sensitive, ethically sound, and inclusive AI-generated art.
Abstract
This paper proposes a novel interdisciplinary framework for the critical evaluation of text-to-image models, addressing the limitations of current technical metrics and bias studies. By integrating art historical analysis, artistic exploration, and critical prompt engineering, the framework offers a more nuanced understanding of these models' capabilities and societal implications. Art historical analysis provides a structured approach to examine visual and symbolic elements, revealing potential biases and misrepresentations. Artistic exploration, through creative experimentation, uncovers hidden potentials and limitations, prompting critical reflection on the algorithms' assumptions. Critical prompt engineering actively challenges the model's assumptions, exposing embedded biases. Case studies demonstrate the framework's practical application, showcasing how it can reveal biases related to gender, race, and cultural representation. This comprehensive approach not only enhances the evaluation of text-to-image models but also contributes to the development of more equitable, responsible, and culturally aware AI systems.
