Table of Contents
Fetching ...

A Framework for Critical Evaluation of Text-to-Image Models: Integrating Art Historical Analysis, Artistic Exploration, and Critical Prompt Engineering

Amalia Foka

TL;DR

Powerful text-to-image models raise concerns that technical metrics like $FID$ and $CLIP$-based evaluations miss artistic, symbolic, and socio-cultural dimensions. The paper proposes an interdisciplinary evaluation framework that unites art historical analysis, artistic exploration, and critical prompt engineering to diagnose biases and cultural representations in AI-generated imagery. Through case studies and methodological steps, it demonstrates how this framework reveals gender, race, and cultural biases that conventional metrics overlook, and it outlines procedures for benchmarking and auditing that promote transparency and accountability. By fostering collaboration across computer science, art history, critical theory, and artistry, the approach aims to guide the responsible development of culturally sensitive, ethically sound, and inclusive AI-generated art.

Abstract

This paper proposes a novel interdisciplinary framework for the critical evaluation of text-to-image models, addressing the limitations of current technical metrics and bias studies. By integrating art historical analysis, artistic exploration, and critical prompt engineering, the framework offers a more nuanced understanding of these models' capabilities and societal implications. Art historical analysis provides a structured approach to examine visual and symbolic elements, revealing potential biases and misrepresentations. Artistic exploration, through creative experimentation, uncovers hidden potentials and limitations, prompting critical reflection on the algorithms' assumptions. Critical prompt engineering actively challenges the model's assumptions, exposing embedded biases. Case studies demonstrate the framework's practical application, showcasing how it can reveal biases related to gender, race, and cultural representation. This comprehensive approach not only enhances the evaluation of text-to-image models but also contributes to the development of more equitable, responsible, and culturally aware AI systems.

A Framework for Critical Evaluation of Text-to-Image Models: Integrating Art Historical Analysis, Artistic Exploration, and Critical Prompt Engineering

TL;DR

Powerful text-to-image models raise concerns that technical metrics like and -based evaluations miss artistic, symbolic, and socio-cultural dimensions. The paper proposes an interdisciplinary evaluation framework that unites art historical analysis, artistic exploration, and critical prompt engineering to diagnose biases and cultural representations in AI-generated imagery. Through case studies and methodological steps, it demonstrates how this framework reveals gender, race, and cultural biases that conventional metrics overlook, and it outlines procedures for benchmarking and auditing that promote transparency and accountability. By fostering collaboration across computer science, art history, critical theory, and artistry, the approach aims to guide the responsible development of culturally sensitive, ethically sound, and inclusive AI-generated art.

Abstract

This paper proposes a novel interdisciplinary framework for the critical evaluation of text-to-image models, addressing the limitations of current technical metrics and bias studies. By integrating art historical analysis, artistic exploration, and critical prompt engineering, the framework offers a more nuanced understanding of these models' capabilities and societal implications. Art historical analysis provides a structured approach to examine visual and symbolic elements, revealing potential biases and misrepresentations. Artistic exploration, through creative experimentation, uncovers hidden potentials and limitations, prompting critical reflection on the algorithms' assumptions. Critical prompt engineering actively challenges the model's assumptions, exposing embedded biases. Case studies demonstrate the framework's practical application, showcasing how it can reveal biases related to gender, race, and cultural representation. This comprehensive approach not only enhances the evaluation of text-to-image models but also contributes to the development of more equitable, responsible, and culturally aware AI systems.

Paper Structure

This paper contains 12 sections, 3 figures.

Figures (3)

  • Figure 1: AI-generated interpretations of Jan van Eyck's The Arnolfini Portrait (1434)
  • Figure 2: AI-generated portraits exploring themes of resilience, dignity, and labor, inspired by the artistic approach of Kehinde Wiley.
  • Figure 3: DALL-E generated images of female and male construction site managers.