Table of Contents
Fetching ...

BANGLASOCIALBENCH: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction

Tanvir Ahmed Sijan, S. M Golam Rifat, Pankaj Chowdhury Partha, Md. Tanjeed Islam, Md. Musfique Anwar

Abstract

Large Language Models have demonstrated strong multilingual fluency, yet fluency alone does not guarantee socially appropriate language use. In high-context languages, communicative competence requires sensitivity to social hierarchy, relational roles, and interactional norms that are encoded directly in everyday language. Bangla exemplifies this challenge through its three-tiered pronominal system, kinship-based addressing, and culturally embedded social customs. We introduce BANGLASOCIALBENCH, the first benchmark designed to evaluate sociopragmatic competence in Bangla through context-dependent language use rather than factual recall. The benchmark spans three domains: Bangla Address Terms, Kinship Reasoning, and Social Customs, and consists of 1,719 culturally grounded instances written and verified by native Bangla speakers. We evaluate twelve contemporary LLMs in a zero-shot setting and observe systematic patterns of cultural misalignment. Models frequently default to overly formal address forms, fail to recognize multiple socially acceptable address pronouns, and conflate kinship terminology across religious contexts. Our findings show that sociopragmatic failures are often structured and non-random, revealing persistent limitations in how current LLMs infer and apply culturally appropriate language use in realistic Bangladeshi social interactions.

BANGLASOCIALBENCH: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction

Abstract

Large Language Models have demonstrated strong multilingual fluency, yet fluency alone does not guarantee socially appropriate language use. In high-context languages, communicative competence requires sensitivity to social hierarchy, relational roles, and interactional norms that are encoded directly in everyday language. Bangla exemplifies this challenge through its three-tiered pronominal system, kinship-based addressing, and culturally embedded social customs. We introduce BANGLASOCIALBENCH, the first benchmark designed to evaluate sociopragmatic competence in Bangla through context-dependent language use rather than factual recall. The benchmark spans three domains: Bangla Address Terms, Kinship Reasoning, and Social Customs, and consists of 1,719 culturally grounded instances written and verified by native Bangla speakers. We evaluate twelve contemporary LLMs in a zero-shot setting and observe systematic patterns of cultural misalignment. Models frequently default to overly formal address forms, fail to recognize multiple socially acceptable address pronouns, and conflate kinship terminology across religious contexts. Our findings show that sociopragmatic failures are often structured and non-random, revealing persistent limitations in how current LLMs infer and apply culturally appropriate language use in realistic Bangladeshi social interactions.
Paper Structure (85 sections, 11 equations, 9 figures, 11 tables)

This paper contains 85 sections, 11 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Prompt design grounded in Hymes' SPEAKING model hymes1962ethnography. Each prompt operationalizes sociolinguistic context through explicit cues for setting, participants, gender, interactional goal, and social norms, allowing controlled evaluation of culturally appropriate Bangla Address Terms
  • Figure 2: Dataset creation pipeline for BanglaSocialBench. The English prompts displayed in the diagram are translated for illustrative purposes; all model evaluations were conducted exclusively using Bangla prompts
  • Figure 3: Overall benchmark accuracy of evaluated LLMs on BanglaSocialBench, computed as the macro-average of performance across Address Terms, Kinship Reasoning, and Social Customs.
  • Figure 4: Asymmetry in inappropriate politeness use across LLMs in Bangla pronominal addressing, with over-politeness occurring substantially more frequently than under-politeness.
  • Figure 5: Directional cross-religious kinship term misalignment. Proportions of culturally inappropriate kinship term substitutions across explicit identity cues, implicit cues, and open-ended prompting. Misalignment is more pronounced toward substituting Muslim-associated kinterms in Hindu-marked contexts.
  • ...and 4 more figures