The Case for "Thick Evaluations" of Cultural Representation in AI
Rida Qadri, Mark Diaz, Ding Wang, Michael Madaio
TL;DR
This paper argues that current evaluations of AI-generated cultural representations are too thin, failing to capture interpretive and contextual meanings rooted in communities. It introduces thick evaluations, a discursive, community-informed framework developed through workshops with 37 participants from Sri Lanka, Pakistan, and India, and identifies five evaluation axes—incorrectness, missingness, specificity, coherence, and connotation. The authors demonstrate that representational goals are situated, dynamic, and negotiated through dialogue, requiring co-constructed metrics that draw on situated knowledge. This approach challenges purely quantitative evaluation paradigms and offers pathways to more culturally attuned AI outputs, bridging social science perspectives with AI practice.
Abstract
Generative AI model outputs have been increasingly evaluated for their (in)ability to represent non-Western cultures. We argue that these evaluations often operate through reductive ideals of representation, abstracted from how people define their own representation and neglecting the inherently interpretive and contextual nature of cultural representation. In contrast to these 'thin' evaluations, we introduce the idea of 'thick evaluations:' a more granular, situated, and discursive measurement framework for evaluating representations of social worlds in AI outputs, steeped in communities' own understandings of representation. We develop this evaluation framework through workshops in South Asia, by studying the 'thick' ways in which people interpret and assign meaning to AI-generated images of their own cultures. We introduce practices for thicker evaluations of representation that expand the understanding of representation underpinning AI evaluations and by co-constructing metrics with communities, bringing measurement in line with the experiences of communities on the ground.
