Table of Contents
Fetching ...

Culturally-Aware Conversations: A Framework & Benchmark for LLMs

Shreya Havaldar, Sunny Rai, Young-Min Cho, Lyle Ungar

TL;DR

The paper introduces the Culturally-Aware Conversations (CAC) framework and benchmark to evaluate LLMs in multicultural dialogues. Grounded in sociocultural theory, CAC models how situational, relational, and cultural context shape linguistic style, and it operationalizes this with six situations, eight relationships, and a three-stage data pipeline to produce context-rich conversations. A culturally diverse annotation process yields a dataset of 48 conversations with 240 stylistically varied responses, plus a range-based notion of acceptable style to capture subjectivity. Evaluations of five leading LLMs show Western norms are easier for models to mimic, revealing gaps in cross-cultural adaptation and highlighting the need for culturally competent conversational agents. The work provides a principled framework and high-quality data to drive evaluation and development of culturally aware NLP systems.

Abstract

Existing benchmarks that measure cultural adaptation in LLMs are misaligned with the actual challenges these models face when interacting with users from diverse cultural backgrounds. In this work, we introduce the first framework and benchmark designed to evaluate LLMs in realistic, multicultural conversational settings. Grounded in sociocultural theory, our framework formalizes how linguistic style - a key element of cultural communication - is shaped by situational, relational, and cultural context. We construct a benchmark dataset based on this framework, annotated by culturally diverse raters, and propose a new set of desiderata for cross-cultural evaluation in NLP: conversational framing, stylistic sensitivity, and subjective correctness. We evaluate today's top LLMs on our benchmark and show that these models struggle with cultural adaptation in a conversational setting.

Culturally-Aware Conversations: A Framework & Benchmark for LLMs

TL;DR

The paper introduces the Culturally-Aware Conversations (CAC) framework and benchmark to evaluate LLMs in multicultural dialogues. Grounded in sociocultural theory, CAC models how situational, relational, and cultural context shape linguistic style, and it operationalizes this with six situations, eight relationships, and a three-stage data pipeline to produce context-rich conversations. A culturally diverse annotation process yields a dataset of 48 conversations with 240 stylistically varied responses, plus a range-based notion of acceptable style to capture subjectivity. Evaluations of five leading LLMs show Western norms are easier for models to mimic, revealing gaps in cross-cultural adaptation and highlighting the need for culturally competent conversational agents. The work provides a principled framework and high-quality data to drive evaluation and development of culturally aware NLP systems.

Abstract

Existing benchmarks that measure cultural adaptation in LLMs are misaligned with the actual challenges these models face when interacting with users from diverse cultural backgrounds. In this work, we introduce the first framework and benchmark designed to evaluate LLMs in realistic, multicultural conversational settings. Grounded in sociocultural theory, our framework formalizes how linguistic style - a key element of cultural communication - is shaped by situational, relational, and cultural context. We construct a benchmark dataset based on this framework, annotated by culturally diverse raters, and propose a new set of desiderata for cross-cultural evaluation in NLP: conversational framing, stylistic sensitivity, and subjective correctness. We evaluate today's top LLMs on our benchmark and show that these models struggle with cultural adaptation in a conversational setting.

Paper Structure

This paper contains 15 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Three key factors influence appropriate linguistic style in conversation: Situation --- the specific scenario of an interaction, Interpersonal Relationship --- the social dynamic between the speakers, and Cultural Context --- the background, values, and beliefs of the participants.
  • Figure 2: The Culturally-Aware Conversations (CAC) Framework. We work with cultural experts to determine common conversational situations with the highest variance in typical behavior across cultures. After establishing these situations, we pinpoint which stylistic axis best captures the cultural variance of each situation. We also determine eight interpersonal relationships whose dynamics vary across cultures and additionally influence the appropriate linguistic style for the given situations.
  • Figure 3: A depiction of how we use the CAC framework to develop a contextualized conversation in our dataset. We walk through an example where the situation is giving critical feedback and the interpersonal relationship is Boss--Employee. In Stage 1, we generate a specific scenario that reflects the situational and relational context. In Stage 2, we use the scenario and stylistic axis to generate a conversation with a range of possible responses that vary on the given stylistic axis. In Stage 3, we recruit annotators from a range of nations to determine which responses are most desirable in which cultures.
  • Figure A1: Cultural differences in day-to-day conversations. We show the mean and accepted range of style values for conversations with strangers, neighbors, and friends.
  • Figure A2: Cultural differences in professional conversations. We show the mean and accepted range of style values for conversations between a boss/employee and coworkers.
  • ...and 1 more figures