Table of Contents
Fetching ...

The Case for "Thick Evaluations" of Cultural Representation in AI

Rida Qadri, Mark Diaz, Ding Wang, Michael Madaio

TL;DR

This paper argues that current evaluations of AI-generated cultural representations are too thin, failing to capture interpretive and contextual meanings rooted in communities. It introduces thick evaluations, a discursive, community-informed framework developed through workshops with 37 participants from Sri Lanka, Pakistan, and India, and identifies five evaluation axes—incorrectness, missingness, specificity, coherence, and connotation. The authors demonstrate that representational goals are situated, dynamic, and negotiated through dialogue, requiring co-constructed metrics that draw on situated knowledge. This approach challenges purely quantitative evaluation paradigms and offers pathways to more culturally attuned AI outputs, bridging social science perspectives with AI practice.

Abstract

Generative AI model outputs have been increasingly evaluated for their (in)ability to represent non-Western cultures. We argue that these evaluations often operate through reductive ideals of representation, abstracted from how people define their own representation and neglecting the inherently interpretive and contextual nature of cultural representation. In contrast to these 'thin' evaluations, we introduce the idea of 'thick evaluations:' a more granular, situated, and discursive measurement framework for evaluating representations of social worlds in AI outputs, steeped in communities' own understandings of representation. We develop this evaluation framework through workshops in South Asia, by studying the 'thick' ways in which people interpret and assign meaning to AI-generated images of their own cultures. We introduce practices for thicker evaluations of representation that expand the understanding of representation underpinning AI evaluations and by co-constructing metrics with communities, bringing measurement in line with the experiences of communities on the ground.

The Case for "Thick Evaluations" of Cultural Representation in AI

TL;DR

This paper argues that current evaluations of AI-generated cultural representations are too thin, failing to capture interpretive and contextual meanings rooted in communities. It introduces thick evaluations, a discursive, community-informed framework developed through workshops with 37 participants from Sri Lanka, Pakistan, and India, and identifies five evaluation axes—incorrectness, missingness, specificity, coherence, and connotation. The authors demonstrate that representational goals are situated, dynamic, and negotiated through dialogue, requiring co-constructed metrics that draw on situated knowledge. This approach challenges purely quantitative evaluation paradigms and offers pathways to more culturally attuned AI outputs, bridging social science perspectives with AI practice.

Abstract

Generative AI model outputs have been increasingly evaluated for their (in)ability to represent non-Western cultures. We argue that these evaluations often operate through reductive ideals of representation, abstracted from how people define their own representation and neglecting the inherently interpretive and contextual nature of cultural representation. In contrast to these 'thin' evaluations, we introduce the idea of 'thick evaluations:' a more granular, situated, and discursive measurement framework for evaluating representations of social worlds in AI outputs, steeped in communities' own understandings of representation. We develop this evaluation framework through workshops in South Asia, by studying the 'thick' ways in which people interpret and assign meaning to AI-generated images of their own cultures. We introduce practices for thicker evaluations of representation that expand the understanding of representation underpinning AI evaluations and by co-constructing metrics with communities, bringing measurement in line with the experiences of communities on the ground.

Paper Structure

This paper contains 27 sections, 3 tables.