Table of Contents
Fetching ...

Bridging the Culture Gap: A Framework for LLM-Driven Socio-Cultural Localization of Math Word Problems in Low-Resource Languages

Israel Abebe Azime, Tadesse Destaw Belay, Dietrich Klakow, Philipp Slusallek, Anshuman Chhabra

TL;DR

The paper tackles the problem of evaluating and improving LLM performance on culturally grounded math word problems in low-resource languages by introducing an automated socio-cultural localization framework that replaces English-centric entities with native names, organizations, and currencies. It presents a detailed pipeline with entity classification, local entity databases, and one-shot LLM-based localization plus quality checks, and evaluates on GSM8K/AfriMGSM data using translations generated with NLLB-200 and screened by COMMET, focusing on robustness to entity variations. The study finds that translations alone can obscure true multilingual math ability, while locale-aware variants reveal performance disparities and can improve robustness when used to augment training data; the gains are language- and model-dependent. Overall, the framework enables scalable creation of culturally aligned MWPs, reduces English-centric biases, and offers a path toward more culturally faithful benchmarks and multilingual reasoning capabilities in LLMs.

Abstract

Large language models (LLMs) have demonstrated significant capabilities in solving mathematical problems expressed in natural language. However, multilingual and culturally-grounded mathematical reasoning in low-resource languages lags behind English due to the scarcity of socio-cultural task datasets that reflect accurate native entities such as person names, organization names, and currencies. Existing multilingual benchmarks are predominantly produced via translation and typically retain English-centric entities, owing to the high cost associated with human annotater-based localization. Moreover, automated localization tools are limited, and hence, truly localized datasets remain scarce. To bridge this gap, we introduce a framework for LLM-driven cultural localization of math word problems that automatically constructs datasets with native names, organizations, and currencies from existing sources. We find that translated benchmarks can obscure true multilingual math ability under appropriate socio-cultural contexts. Through extensive experiments, we also show that our framework can help mitigate English-centric entity bias and improves robustness when native entities are introduced across various languages.

Bridging the Culture Gap: A Framework for LLM-Driven Socio-Cultural Localization of Math Word Problems in Low-Resource Languages

TL;DR

The paper tackles the problem of evaluating and improving LLM performance on culturally grounded math word problems in low-resource languages by introducing an automated socio-cultural localization framework that replaces English-centric entities with native names, organizations, and currencies. It presents a detailed pipeline with entity classification, local entity databases, and one-shot LLM-based localization plus quality checks, and evaluates on GSM8K/AfriMGSM data using translations generated with NLLB-200 and screened by COMMET, focusing on robustness to entity variations. The study finds that translations alone can obscure true multilingual math ability, while locale-aware variants reveal performance disparities and can improve robustness when used to augment training data; the gains are language- and model-dependent. Overall, the framework enables scalable creation of culturally aligned MWPs, reduces English-centric biases, and offers a path toward more culturally faithful benchmarks and multilingual reasoning capabilities in LLMs.

Abstract

Large language models (LLMs) have demonstrated significant capabilities in solving mathematical problems expressed in natural language. However, multilingual and culturally-grounded mathematical reasoning in low-resource languages lags behind English due to the scarcity of socio-cultural task datasets that reflect accurate native entities such as person names, organization names, and currencies. Existing multilingual benchmarks are predominantly produced via translation and typically retain English-centric entities, owing to the high cost associated with human annotater-based localization. Moreover, automated localization tools are limited, and hence, truly localized datasets remain scarce. To bridge this gap, we introduce a framework for LLM-driven cultural localization of math word problems that automatically constructs datasets with native names, organizations, and currencies from existing sources. We find that translated benchmarks can obscure true multilingual math ability under appropriate socio-cultural contexts. Through extensive experiments, we also show that our framework can help mitigate English-centric entity bias and improves robustness when native entities are introduced across various languages.

Paper Structure

This paper contains 17 sections, 1 equation, 6 figures, 6 tables.

Figures (6)

  • Figure 1: An example showcasing an English math word problem, its direct translation, and an automatically localized version with culturally adapted entities. While the problem structure remains identical, large language models (LLMs) often fail to answer correctly when entity names or currencies are altered. This highlights a key limitation in current LLM robustness. In this paper, our goal is to audit models and rectify their robustness issues so that remain consistent and accurate across such culturally grounded variations.
  • Figure 2: Human Validated Comparison of Localization Quality Between Auto Localization and Direct Prompting (Gemini-1.5-pro). Our auto localization framework produces significantly better and appropriate culturally localized outputs compared to direct prompting, which often fails to adapt entities to the target culture. This highlights the advantage of our method in achieving consistent and controllable localization.
  • Figure 3: Direct Translation(AfriMGSM) vs. Auto Localization(Localized-AfriMGSM) Numeric Match performance. We observe performance differences between translated and localized benchmark indicating a lack of robustness in LLM mathematical ability for real-life culturally localized MWP variants.
  • Figure 4: Effect of Cultural Entities on English Benchmarks. We investigate whether replacing default English entities with culturally specific entities ($x_{ent}$; see Table \ref{['stages']}) influences model performance. The results show that across models and languages, the inclusion of local entities consistently shifts evaluation outcomes, indicating that cultural grounding plays a measurable role in benchmark performance.
  • Figure 5: Number of data samples without cultural entity replacements (out of 1500 selected), grouped by language in the training dataset.
  • ...and 1 more figures