Fluent but Foreign: Even Regional LLMs Lack Cultural Alignment
Dhruv Agarwal, Anya Shukla, Sunayana Sitaram, Aditya Vashistha
TL;DR
The paper tackles the question of whether regionally trained Indian LLMs truly reflect Indian culture or merely produce local language fluency. By grounding evaluation in Hofstede’s cultural onion across four tasks (Value Orientation, Opinion Alignment, Cultural Knowledge, Cultural Adaptation) and supplementing with a 115-person writing study, it compares six Indic models with six global baselines. Across datasets (WVS, GlobalOpinionQA, CulturalBench, NormAd) and the user study, Indic models fail to outperform global models in aligning with Indian values, often showing stronger Western bias even after prompting or regional fine-tuning, with formulas like CAD_q,m = Sim_q(m, India) − Sim_q(m, USA) and nCAD to normalize for cross-country differences. The findings argue for thick×wide, community-grounded, and untranslated regional corpora paired with population-scale evaluation to build truly sovereign LLMs, highlighting practical implications for HCI, NLP, and AI governance.
Abstract
Large language models (LLMs) are used worldwide, yet exhibit Western cultural tendencies. Many countries are now building ``regional'' or ``sovereign'' LLMs, but it remains unclear whether they reflect local values and practices or merely speak local languages. Using India as a case study, we evaluate six Indic and six global LLMs on two dimensions -- values and practices -- grounded in nationally representative surveys and community-sourced QA datasets. Across tasks, Indic models do not align better with Indian norms than global models; in fact, a U.S. respondent is a closer proxy for Indian values than any Indic model. We further run a user study with 115 Indian users and find that writing suggestions from both global and Indic LLMs introduce Westernized or exoticized writing. Prompting and regional fine-tuning fail to recover alignment and can even degrade existing knowledge. We attribute this to scarce culturally grounded data, especially for pretraining. We position cultural evaluation as a first-class requirement alongside multilingual benchmarks and offer a reusable, community-grounded methodology. We call for native, community-authored corpora and thickxwide evaluations to build truly sovereign LLMs.
