'Too much alignment; not enough culture': Re-balancing cultural alignment practices in LLMs
Eric J. W. Orlowski, Hakim Norhashim, Tristan Koh Ly Wey
TL;DR
The paper critiques current cultural alignment in LLMs for overreliance on demographic proxies and benchmarked metrics, arguing that culture's depth requires interpretive, qualitative methods. It introduces the notion of thick outputs, grounded in Geertz's thick description, and posits three necessary conditions—cultural representation, capacity for thick outputs, and prompt anchoring—to achieve meaningful, context-specific alignment. By embracing fractal complexity and targeted cultural contexts, the authors propose ethnographic evaluation and cross-disciplinary collaboration as essential to evaluating and refining culturally sensitive AI. The work aims to move beyond surface-level cultural artefacts toward ethically responsible, deeply contextual AI that acknowledges human complexity and variability.
Abstract
While cultural alignment has increasingly become a focal point within AI research, current approaches relying predominantly on quantitative benchmarks and simplistic proxies fail to capture the deeply nuanced and context-dependent nature of human cultures. Existing alignment practices typically reduce culture to static demographic categories or superficial cultural facts, thereby sidestepping critical questions about what it truly means to be culturally aligned. This paper argues for a fundamental shift towards integrating interpretive qualitative approaches drawn from social sciences into AI alignment practices, specifically in the context of Large Language Models (LLMs). Drawing inspiration from Clifford Geertz's concept of "thick description," we propose that AI systems must produce outputs that reflect deeper cultural meanings--what we term "thick outputs"-grounded firmly in user-provided context and intent. We outline three necessary conditions for successful cultural alignment: sufficiently scoped cultural representations, the capacity for nuanced outputs, and the anchoring of outputs in the cultural contexts implied within prompts. Finally, we call for cross-disciplinary collaboration and the adoption of qualitative, ethnographic evaluation methods as vital steps toward developing AI systems that are genuinely culturally sensitive, ethically responsible, and reflective of human complexity.
