Culturally-Aware Conversations: A Framework & Benchmark for LLMs
Shreya Havaldar, Sunny Rai, Young-Min Cho, Lyle Ungar
TL;DR
The paper introduces the Culturally-Aware Conversations (CAC) framework and benchmark to evaluate LLMs in multicultural dialogues. Grounded in sociocultural theory, CAC models how situational, relational, and cultural context shape linguistic style, and it operationalizes this with six situations, eight relationships, and a three-stage data pipeline to produce context-rich conversations. A culturally diverse annotation process yields a dataset of 48 conversations with 240 stylistically varied responses, plus a range-based notion of acceptable style to capture subjectivity. Evaluations of five leading LLMs show Western norms are easier for models to mimic, revealing gaps in cross-cultural adaptation and highlighting the need for culturally competent conversational agents. The work provides a principled framework and high-quality data to drive evaluation and development of culturally aware NLP systems.
Abstract
Existing benchmarks that measure cultural adaptation in LLMs are misaligned with the actual challenges these models face when interacting with users from diverse cultural backgrounds. In this work, we introduce the first framework and benchmark designed to evaluate LLMs in realistic, multicultural conversational settings. Grounded in sociocultural theory, our framework formalizes how linguistic style - a key element of cultural communication - is shaped by situational, relational, and cultural context. We construct a benchmark dataset based on this framework, annotated by culturally diverse raters, and propose a new set of desiderata for cross-cultural evaluation in NLP: conversational framing, stylistic sensitivity, and subjective correctness. We evaluate today's top LLMs on our benchmark and show that these models struggle with cultural adaptation in a conversational setting.
