Hire Your Anthropologist! Rethinking Culture Benchmarks Through an Anthropological Lens

Mai AlKhamissi; Yunze Xiao; Badr AlKhamissi; Mona Diab

Hire Your Anthropologist! Rethinking Culture Benchmarks Through an Anthropological Lens

Mai AlKhamissi, Yunze Xiao, Badr AlKhamissi, Mona Diab

TL;DR

This paper argues that current culture benchmarks in NLP reduce culture to static facts or homogeneous preferences, clashing with anthropological views of culture as dynamic and situated. It offers a four-part taxonomy—Culture-as-Knowledge, Culture-as-Preference, Culture-as-Dynamics, and Culture-as-Bias—and uses it to analyze 20 benchmarks, revealing six recurrent methodological issues. The authors propose concrete improvements, including real-world narratives, participatory design, contextual evaluation, and treating disagreement as a data signal, to build more nuanced benchmarks. By bridging social science with NLP practice, the work provides a roadmap for evaluating and mitigating cultural biases while capturing the lived, contested nature of culture in AI systems.

Abstract

Cultural evaluation of large language models has become increasingly important, yet current benchmarks often reduce culture to static facts or homogeneous values. This view conflicts with anthropological accounts that emphasize culture as dynamic, historically situated, and enacted in practice. To analyze this gap, we introduce a four-part framework that categorizes how benchmarks frame culture, such as knowledge, preference, performance, or bias. Using this lens, we qualitatively examine 20 cultural benchmarks and identify six recurring methodological issues, including treating countries as cultures, overlooking within-culture diversity, and relying on oversimplified survey formats. Drawing on established anthropological methods, we propose concrete improvements: incorporating real-world narratives and scenarios, involving cultural communities in design and validation, and evaluating models in context rather than isolation. Our aim is to guide the development of cultural benchmarks that go beyond static recall tasks and more accurately capture the responses of the models to complex cultural situations.

Hire Your Anthropologist! Rethinking Culture Benchmarks Through an Anthropological Lens

TL;DR

Abstract

Hire Your Anthropologist! Rethinking Culture Benchmarks Through an Anthropological Lens

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)